From nobody Tue May 7 21:06:55 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1170FC4708D for ; Fri, 2 Dec 2022 17:36:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233729AbiLBRf7 (ORCPT ); Fri, 2 Dec 2022 12:35:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233673AbiLBRf4 (ORCPT ); Fri, 2 Dec 2022 12:35:56 -0500 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BBE1E8020 for ; Fri, 2 Dec 2022 09:35:50 -0800 (PST) Received: by mail-pl1-x630.google.com with SMTP id p24so5261497plw.1 for ; Fri, 02 Dec 2022 09:35:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=COJUGofCPj7T/vxPZnCRsDtixu4K2hQeSlt1ciZik/k=; b=NNkgHaXVRi5xaYMOO5+6dfN/kU+XC3WBQEhIYz1qedVlkGbDOaw7sIru77GoqM4kGC MwJdhQlV09zvyEwRMn5j8ktQaHtx6ad1OVfqk7KRKSAv7G5kStYqmsZPSYTnDzx9GFcN Gyyr4kfb8Jc1K9140ZNXIlRlSTFDsOl3G3Nfv/MqZ3vgxodozeq45E56wrSIK6O90fK6 LMFycuUoe/ntCSlQ1BdVv/oSgFtUqyG0RW/Ou3F3uIplZAdYlxg46F00a3OBaMIX3oj8 tVUm6Kz8qewO6EobyAbBUvCMdfCa40vpgZ4I64/iZqLSAFtKamJ3ASk792/pyk7fVJtQ U+Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=COJUGofCPj7T/vxPZnCRsDtixu4K2hQeSlt1ciZik/k=; b=brfA4WbBWzNKp8rDVUvbO+aaqtSsuH5ZsnnIZaFEynhPlulynFKV6Inbw/iQV3xqNM HKaN7P4ZM+GaykzSF0s5GpIYIjCJSMqY2Hv0bmY4njsTSIECRBBwSWckFCpPoHLOY5UH pP81xmV7/YYfMEkXthxtvlb8OdSKwonRYAJQS/5gVxgUjz7gZxAyghup6uNJNbtFgXS9 Qhzae2jT1NonCm+vJeNxUZ09GgdTP+gh9ASB3/5hF9qMEsh8QGoIIZhiBOJoQezuewkM LtLD90ImmmNnio8UFNkMq9L29ZAxbqtjI7TWDNhucd1jZA+DclqRol7pLaruW4gtnMYp +MpQ== X-Gm-Message-State: ANoB5plA/H+f8hdas3gm3dp5p6e6aYptEH4W5rO2LWJs2WCEmC1W6Oni 1BMpqWN3hpNyyv4jaZKxUUSWKg== X-Google-Smtp-Source: AA0mqf41RUEqBVd8F78d70jJh843tYqri7TERgqwCkJaB4mtRRqahTXdXeWHIe0QmGnScG2rlDCFsg== X-Received: by 2002:a17:903:2112:b0:188:7dca:6f41 with SMTP id o18-20020a170903211200b001887dca6f41mr53231963ple.72.1670002549031; Fri, 02 Dec 2022 09:35:49 -0800 (PST) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id x14-20020a170902ec8e00b0017c37a5a2fdsm5854202plg.216.2022.12.02.09.35.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Dec 2022 09:35:48 -0800 (PST) From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Krasnov Arseniy , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff Date: Fri, 2 Dec 2022 09:35:18 -0800 Message-Id: <20221202173520.10428-1-bobby.eshleman@bytedance.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This commit changes virtio/vsock to use sk_buff instead of virtio_vsock_pkt. Beyond better conforming to other net code, using sk_buff allows vsock to use sk_buff-dependent features in the future (such as sockmap) and improves throughput. This patch introduces the following performance changes: Tool/Config: uperf w/ 64 threads, SOCK_STREAM Test Runs: 5, mean of results Before: commit 95ec6bce2a0b ("Merge branch 'net-ipa-more-endpoints'") Test: 64KB, g2h Before: 21.63 Gb/s After: 25.59 Gb/s (+18%) Test: 16B, g2h Before: 11.86 Mb/s After: 17.41 Mb/s (+46%) Test: 64KB, h2g Before: 2.15 Gb/s After: 3.6 Gb/s (+67%) Test: 16B, h2g Before: 14.38 Mb/s After: 18.43 Mb/s (+28%) Signed-off-by: Bobby Eshleman Acked-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella --- Changes in v5: - last_skb instead of skb: last_hdr->len =3D cpu_to_le32(last_skb->len) Changes in v4: - vdso/bits.h -> linux/bits.h - add virtio_vsock_alloc_skb() helper - virtio/vsock: rename buf_len -> total_len - update last_hdr->len - fix build_skb() for vsockmon (tested) - add queue helpers - use spin_{unlock/lock}_bh() instead of spin_lock()/spin_unlock() - note: I only ran a few g2h tests to check that this change had no perf impact. The above data is still from patch v3. Changes in v3: - fix seqpacket bug - use zero in vhost_add_used(..., 0) device doesn't write to buffer - use xmas tree style declarations - vsock_hdr() -> virtio_vsock_hdr() and other include file style fixes - no skb merging - save space by not using vsock_metadata - use _skb_refdst instead of skb buffer space for flags - use skb_pull() to keep track of read bytes instead of using an an extra variable 'off' in the skb buffer space - remove unnecessary sk_allocation assignment - do not zero hdr needlessly - introduce virtio_transport_skb_len() because skb->len changes now - use spin_lock() directly on queue lock instead of sk_buff_head helpers which use spin_lock_irqsave() (e.g., skb_dequeue) - do not reduce buffer size to be page size divisible - Note: the biggest performance change came from loosening the spinlock variation and not reducing the buffer size. Changes in v2: - Use alloc_skb() directly instead of sock_alloc_send_pskb() to minimize uAPI changes. - Do not marshal errors to -ENOMEM for non-virtio implementations. - No longer a part of the original series - Some code cleanup and refactoring - Include performance stats --- drivers/vhost/vsock.c | 213 +++++------- include/linux/virtio_vsock.h | 145 ++++++-- net/vmw_vsock/virtio_transport.c | 149 +++------ net/vmw_vsock/virtio_transport_common.c | 422 +++++++++++++----------- net/vmw_vsock/vsock_loopback.c | 51 +-- 5 files changed, 514 insertions(+), 466 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 5703775af129..04b28c4c58d0 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -51,8 +51,7 @@ struct vhost_vsock { struct hlist_node hash; =20 struct vhost_work send_pkt_work; - spinlock_t send_pkt_list_lock; - struct list_head send_pkt_list; /* host->guest pending packets */ + struct sk_buff_head send_pkt_queue; /* host->guest pending packets */ =20 atomic_t queued_replies; =20 @@ -108,40 +107,33 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, vhost_disable_notify(&vsock->dev, vq); =20 do { - struct virtio_vsock_pkt *pkt; + struct virtio_vsock_hdr *hdr; + size_t iov_len, payload_len; struct iov_iter iov_iter; + u32 flags_to_restore =3D 0; + struct sk_buff *skb; unsigned out, in; size_t nbytes; - size_t iov_len, payload_len; int head; - u32 flags_to_restore =3D 0; =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - if (list_empty(&vsock->send_pkt_list)) { - spin_unlock_bh(&vsock->send_pkt_list_lock); + spin_lock(&vsock->send_pkt_queue.lock); + skb =3D __skb_dequeue(&vsock->send_pkt_queue); + spin_unlock(&vsock->send_pkt_queue.lock); + + if (!skb) { vhost_enable_notify(&vsock->dev, vq); break; } =20 - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - head =3D vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, NULL); if (head < 0) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); break; } =20 if (head =3D=3D vq->num) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); /* We cannot finish yet if more buffers snuck in while * re-enabling notify. */ @@ -153,26 +145,27 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, } =20 if (out) { - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); vq_err(vq, "Expected 0 output buffers, got %u\n", out); break; } =20 iov_len =3D iov_length(&vq->iov[out], in); - if (iov_len < sizeof(pkt->hdr)) { - virtio_transport_free_pkt(pkt); + if (iov_len < sizeof(*hdr)) { + virtio_vsock_kfree_skb(skb); vq_err(vq, "Buffer len [%zu] too small\n", iov_len); break; } =20 iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len); - payload_len =3D pkt->len - pkt->off; + payload_len =3D skb->len; + hdr =3D virtio_vsock_hdr(skb); =20 /* If the packet is greater than the space available in the * buffer, we split it using multiple buffers. */ - if (payload_len > iov_len - sizeof(pkt->hdr)) { - payload_len =3D iov_len - sizeof(pkt->hdr); + if (payload_len > iov_len - sizeof(*hdr)) { + payload_len =3D iov_len - sizeof(*hdr); =20 /* As we are copying pieces of large packet's buffer to * small rx buffers, headers of packets in rx queue are @@ -185,31 +178,30 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, * bits set. After initialized header will be copied to * rx buffer, these required bits will be restored. */ - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { - pkt->hdr.flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) { + hdr->flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); flags_to_restore |=3D VIRTIO_VSOCK_SEQ_EOM; =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) { - pkt->hdr.flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) { + hdr->flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); flags_to_restore |=3D VIRTIO_VSOCK_SEQ_EOR; } } } =20 /* Set the correct length in the header */ - pkt->hdr.len =3D cpu_to_le32(payload_len); + hdr->len =3D cpu_to_le32(payload_len); =20 - nbytes =3D copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); - if (nbytes !=3D sizeof(pkt->hdr)) { - virtio_transport_free_pkt(pkt); + nbytes =3D copy_to_iter(hdr, sizeof(*hdr), &iov_iter); + if (nbytes !=3D sizeof(*hdr)) { + virtio_vsock_kfree_skb(skb); vq_err(vq, "Faulted on copying pkt hdr\n"); break; } =20 - nbytes =3D copy_to_iter(pkt->buf + pkt->off, payload_len, - &iov_iter); + nbytes =3D copy_to_iter(skb->data, payload_len, &iov_iter); if (nbytes !=3D payload_len) { - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); vq_err(vq, "Faulted on copying pkt buf\n"); break; } @@ -217,31 +209,28 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, /* Deliver to monitoring devices all packets that we * will transmit. */ - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); =20 - vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len); + vhost_add_used(vq, head, sizeof(*hdr) + payload_len); added =3D true; =20 - pkt->off +=3D payload_len; + skb_pull(skb, payload_len); total_len +=3D payload_len; =20 /* If we didn't send all the payload we can requeue the packet * to send it with the next available buffer. */ - if (pkt->off < pkt->len) { - pkt->hdr.flags |=3D cpu_to_le32(flags_to_restore); + if (skb->len > 0) { + hdr->flags |=3D cpu_to_le32(flags_to_restore); =20 - /* We are queueing the same virtio_vsock_pkt to handle + /* We are queueing the same skb to handle * the remaining bytes, and we want to deliver it * to monitoring devices in the next iteration. */ - pkt->tap_delivered =3D false; - - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_clear_tap_delivered(skb); + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); } else { - if (pkt->reply) { + if (virtio_vsock_skb_reply(skb)) { int val; =20 val =3D atomic_dec_return(&vsock->queued_replies); @@ -253,7 +242,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, restart_tx =3D true; } =20 - virtio_transport_free_pkt(pkt); + virtio_vsock_consume_skb(skb); } } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); if (added) @@ -278,28 +267,26 @@ static void vhost_transport_send_pkt_work(struct vhos= t_work *work) } =20 static int -vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt) +vhost_transport_send_pkt(struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct vhost_vsock *vsock; - int len =3D pkt->len; + int len =3D skb->len; =20 rcu_read_lock(); =20 /* Find the vhost_vsock according to guest context id */ - vsock =3D vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid)); + vsock =3D vhost_vsock_get(le64_to_cpu(hdr->dst_cid)); if (!vsock) { rcu_read_unlock(); - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); return -ENODEV; } =20 - if (pkt->reply) + if (virtio_vsock_skb_reply(skb)) atomic_inc(&vsock->queued_replies); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add_tail(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); vhost_work_queue(&vsock->dev, &vsock->send_pkt_work); =20 rcu_read_unlock(); @@ -310,10 +297,8 @@ static int vhost_transport_cancel_pkt(struct vsock_sock *vsk) { struct vhost_vsock *vsock; - struct virtio_vsock_pkt *pkt, *n; int cnt =3D 0; int ret =3D -ENODEV; - LIST_HEAD(freeme); =20 rcu_read_lock(); =20 @@ -322,20 +307,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) if (!vsock) goto out; =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) { - if (pkt->vsk !=3D vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - if (pkt->reply) - cnt++; - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + cnt =3D virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue); =20 if (cnt) { struct vhost_virtqueue *tx_vq =3D &vsock->vqs[VSOCK_VQ_TX]; @@ -352,12 +324,14 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) return ret; } =20 -static struct virtio_vsock_pkt * -vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, +static struct sk_buff * +vhost_vsock_alloc_skb(struct vhost_virtqueue *vq, unsigned int out, unsigned int in) { - struct virtio_vsock_pkt *pkt; + struct virtio_vsock_hdr *hdr; struct iov_iter iov_iter; + struct sk_buff *skb; + size_t payload_len; size_t nbytes; size_t len; =20 @@ -366,50 +340,47 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } =20 - pkt =3D kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + len =3D iov_length(vq->iov, out); + + /* len contains both payload and hdr */ + skb =3D virtio_vsock_alloc_skb(len, GFP_KERNEL); + if (!skb) return NULL; =20 - len =3D iov_length(vq->iov, out); iov_iter_init(&iov_iter, WRITE, vq->iov, out, len); =20 - nbytes =3D copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); - if (nbytes !=3D sizeof(pkt->hdr)) { + hdr =3D virtio_vsock_hdr(skb); + nbytes =3D copy_from_iter(hdr, sizeof(*hdr), &iov_iter); + if (nbytes !=3D sizeof(*hdr)) { vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n", - sizeof(pkt->hdr), nbytes); - kfree(pkt); + sizeof(*hdr), nbytes); + virtio_vsock_kfree_skb(skb); return NULL; } =20 - pkt->len =3D le32_to_cpu(pkt->hdr.len); + payload_len =3D le32_to_cpu(hdr->len); =20 /* No payload */ - if (!pkt->len) - return pkt; + if (!payload_len) + return skb; =20 /* The pkt is too big */ - if (pkt->len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { - kfree(pkt); + if (payload_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { + virtio_vsock_kfree_skb(skb); return NULL; } =20 - pkt->buf =3D kvmalloc(pkt->len, GFP_KERNEL); - if (!pkt->buf) { - kfree(pkt); - return NULL; - } + virtio_vsock_skb_rx_put(skb); =20 - pkt->buf_len =3D pkt->len; - - nbytes =3D copy_from_iter(pkt->buf, pkt->len, &iov_iter); - if (nbytes !=3D pkt->len) { - vq_err(vq, "Expected %u byte payload, got %zu bytes\n", - pkt->len, nbytes); - virtio_transport_free_pkt(pkt); + nbytes =3D copy_from_iter(skb->data, payload_len, &iov_iter); + if (nbytes !=3D payload_len) { + vq_err(vq, "Expected %zu byte payload, got %zu bytes\n", + payload_len, nbytes); + virtio_vsock_kfree_skb(skb); return NULL; } =20 - return pkt; + return skb; } =20 /* Is there space left for replies to rx packets? */ @@ -496,9 +467,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_wor= k *work) poll.work); struct vhost_vsock *vsock =3D container_of(vq->dev, struct vhost_vsock, dev); - struct virtio_vsock_pkt *pkt; int head, pkts =3D 0, total_len =3D 0; unsigned int out, in; + struct sk_buff *skb; bool added =3D false; =20 mutex_lock(&vq->mutex); @@ -511,6 +482,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_wor= k *work) =20 vhost_disable_notify(&vsock->dev, vq); do { + struct virtio_vsock_hdr *hdr; + if (!vhost_vsock_more_replies(vsock)) { /* Stop tx until the device processes already * pending replies. Leave tx virtqueue @@ -532,24 +505,26 @@ static void vhost_vsock_handle_tx_kick(struct vhost_w= ork *work) break; } =20 - pkt =3D vhost_vsock_alloc_pkt(vq, out, in); - if (!pkt) { + skb =3D vhost_vsock_alloc_skb(vq, out, in); + if (!skb) { vq_err(vq, "Faulted on pkt\n"); continue; } =20 - total_len +=3D sizeof(pkt->hdr) + pkt->len; + total_len +=3D sizeof(*hdr) + skb->len; =20 /* Deliver to monitoring devices all received packets */ - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); + + hdr =3D virtio_vsock_hdr(skb); =20 /* Only accept correctly addressed packets */ - if (le64_to_cpu(pkt->hdr.src_cid) =3D=3D vsock->guest_cid && - le64_to_cpu(pkt->hdr.dst_cid) =3D=3D + if (le64_to_cpu(hdr->src_cid) =3D=3D vsock->guest_cid && + le64_to_cpu(hdr->dst_cid) =3D=3D vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, pkt); + virtio_transport_recv_pkt(&vhost_transport, skb); else - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); =20 vhost_add_used(vq, head, 0); added =3D true; @@ -693,8 +668,7 @@ static int vhost_vsock_dev_open(struct inode *inode, st= ruct file *file) VHOST_VSOCK_WEIGHT, true, NULL); =20 file->private_data =3D vsock; - spin_lock_init(&vsock->send_pkt_list_lock); - INIT_LIST_HEAD(&vsock->send_pkt_list); + skb_queue_head_init(&vsock->send_pkt_queue); vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work); return 0; =20 @@ -760,16 +734,7 @@ static int vhost_vsock_dev_release(struct inode *inode= , struct file *file) vhost_vsock_flush(vsock); vhost_dev_stop(&vsock->dev); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - while (!list_empty(&vsock->send_pkt_list)) { - struct virtio_vsock_pkt *pkt; - - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - virtio_transport_free_pkt(pkt); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue); =20 vhost_dev_cleanup(&vsock->dev); kfree(vsock->dev.vqs); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 35d7eedb5e8e..6c0b2d4da3fe 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -3,10 +3,129 @@ #define _LINUX_VIRTIO_VSOCK_H =20 #include +#include #include #include #include =20 +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr)) + +enum virtio_vsock_skb_flags { + VIRTIO_VSOCK_SKB_FLAGS_REPLY =3D BIT(0), + VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED =3D BIT(1), +}; + +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *sk= b) +{ + return (struct virtio_vsock_hdr *)skb->head; +} + +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb) +{ + return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_REPLY; +} + +static inline void virtio_vsock_skb_set_reply(struct sk_buff *skb) +{ + skb->_skb_refdst |=3D VIRTIO_VSOCK_SKB_FLAGS_REPLY; +} + +static inline bool virtio_vsock_skb_tap_delivered(struct sk_buff *skb) +{ + return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED; +} + +static inline void virtio_vsock_skb_set_tap_delivered(struct sk_buff *skb) +{ + skb->_skb_refdst |=3D VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED; +} + +static inline void virtio_vsock_skb_clear_tap_delivered(struct sk_buff *sk= b) +{ + skb->_skb_refdst &=3D ~VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED; +} + +static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb) +{ + u32 len; + + len =3D le32_to_cpu(virtio_vsock_hdr(skb)->len); + + if (len > 0) + skb_put(skb, len); +} + +static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gf= p_t mask) +{ + struct sk_buff *skb; + + skb =3D alloc_skb(size, mask); + if (!skb) + return NULL; + + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM); + return skb; +} + +static inline void virtio_vsock_kfree_skb(struct sk_buff *skb) +{ + skb->_skb_refdst =3D 0; + kfree_skb(skb); +} + +static inline void +virtio_vsock_skb_queue_head(struct sk_buff_head *list, struct sk_buff *skb) +{ + spin_lock_bh(&list->lock); + __skb_queue_head(list, skb); + spin_unlock_bh(&list->lock); +} + +static inline void +virtio_vsock_skb_queue_tail(struct sk_buff_head *list, struct sk_buff *skb) +{ + spin_lock_bh(&list->lock); + __skb_queue_tail(list, skb); + spin_unlock_bh(&list->lock); +} + +static inline struct sk_buff *virtio_vsock_skb_dequeue(struct sk_buff_head= *list) +{ + struct sk_buff *skb; + + spin_lock_bh(&list->lock); + skb =3D __skb_dequeue(list); + spin_unlock_bh(&list->lock); + + return skb; +} + +static inline void __virtio_vsock_skb_queue_purge(struct sk_buff_head *lis= t) +{ + struct sk_buff *skb; + + while ((skb =3D __skb_dequeue(list)) !=3D NULL) + virtio_vsock_kfree_skb(skb); +} + +static inline void virtio_vsock_skb_queue_purge(struct sk_buff_head *list) +{ + spin_lock_bh(&list->lock); + __virtio_vsock_skb_queue_purge(list); + spin_unlock_bh(&list->lock); +} + +static inline size_t virtio_vsock_skb_len(struct sk_buff *skb) +{ + return (size_t)(skb_end_pointer(skb) - skb->head); +} + +static inline void virtio_vsock_consume_skb(struct sk_buff *skb) +{ + skb->_skb_refdst =3D 0; + consume_skb(skb); +} + #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4) #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64) @@ -35,23 +154,10 @@ struct virtio_vsock_sock { u32 last_fwd_cnt; u32 rx_bytes; u32 buf_alloc; - struct list_head rx_queue; + struct sk_buff_head rx_queue; u32 msg_count; }; =20 -struct virtio_vsock_pkt { - struct virtio_vsock_hdr hdr; - struct list_head list; - /* socket refcnt not held, only use for cancellation */ - struct vsock_sock *vsk; - void *buf; - u32 buf_len; - u32 len; - u32 off; - bool reply; - bool tap_delivered; -}; - struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; @@ -68,7 +174,7 @@ struct virtio_transport { struct vsock_transport transport; =20 /* Takes ownership of the packet */ - int (*send_pkt)(struct virtio_vsock_pkt *pkt); + int (*send_pkt)(struct sk_buff *skb); }; =20 ssize_t @@ -149,11 +255,10 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk); =20 void virtio_transport_recv_pkt(struct virtio_transport *t, - struct virtio_vsock_pkt *pkt); -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt); -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct vir= tio_vsock_pkt *pkt); + struct sk_buff *skb); +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_= buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit= ); -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt); - +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb); +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list); #endif /* _LINUX_VIRTIO_VSOCK_H */ diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transp= ort.c index ad64f403536a..ee0c3c91c06b 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -42,8 +42,7 @@ struct virtio_vsock { bool tx_run; =20 struct work_struct send_pkt_work; - spinlock_t send_pkt_list_lock; - struct list_head send_pkt_list; + struct sk_buff_head send_pkt_queue; =20 atomic_t queued_replies; =20 @@ -101,41 +100,31 @@ virtio_transport_send_pkt_work(struct work_struct *wo= rk) vq =3D vsock->vqs[VSOCK_VQ_TX]; =20 for (;;) { - struct virtio_vsock_pkt *pkt; struct scatterlist hdr, buf, *sgs[2]; int ret, in_sg =3D 0, out_sg =3D 0; + struct sk_buff *skb; bool reply; =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - if (list_empty(&vsock->send_pkt_list)) { - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb =3D virtio_vsock_skb_dequeue(&vsock->send_pkt_queue); + if (!skb) break; - } - - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - spin_unlock_bh(&vsock->send_pkt_list_lock); =20 - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); + reply =3D virtio_vsock_skb_reply(skb); =20 - reply =3D pkt->reply; - - sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); + sg_init_one(&hdr, virtio_vsock_hdr(skb), sizeof(*virtio_vsock_hdr(skb))); sgs[out_sg++] =3D &hdr; - if (pkt->buf) { - sg_init_one(&buf, pkt->buf, pkt->len); + if (skb->len > 0) { + sg_init_one(&buf, skb->data, skb->len); sgs[out_sg++] =3D &buf; } =20 - ret =3D virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL); + ret =3D virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, GFP_KERNEL); /* Usually this means that there is no more space available in * the vq */ if (ret < 0) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); break; } =20 @@ -164,32 +153,32 @@ virtio_transport_send_pkt_work(struct work_struct *wo= rk) } =20 static int -virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt) +virtio_transport_send_pkt(struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; - int len =3D pkt->len; + int len =3D skb->len; + + hdr =3D virtio_vsock_hdr(skb); =20 rcu_read_lock(); vsock =3D rcu_dereference(the_virtio_vsock); if (!vsock) { - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); len =3D -ENODEV; goto out_rcu; } =20 - if (le64_to_cpu(pkt->hdr.dst_cid) =3D=3D vsock->guest_cid) { - virtio_transport_free_pkt(pkt); + if (le64_to_cpu(hdr->dst_cid) =3D=3D vsock->guest_cid) { + virtio_vsock_kfree_skb(skb); len =3D -ENODEV; goto out_rcu; } =20 - if (pkt->reply) + if (virtio_vsock_skb_reply(skb)) atomic_inc(&vsock->queued_replies); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add_tail(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); =20 out_rcu: @@ -201,9 +190,7 @@ static int virtio_transport_cancel_pkt(struct vsock_sock *vsk) { struct virtio_vsock *vsock; - struct virtio_vsock_pkt *pkt, *n; int cnt =3D 0, ret; - LIST_HEAD(freeme); =20 rcu_read_lock(); vsock =3D rcu_dereference(the_virtio_vsock); @@ -212,20 +199,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk) goto out_rcu; } =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) { - if (pkt->vsk !=3D vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - if (pkt->reply) - cnt++; - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + cnt =3D virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue); =20 if (cnt) { struct virtqueue *rx_vq =3D vsock->vqs[VSOCK_VQ_RX]; @@ -246,38 +220,28 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk) =20 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) { - int buf_len =3D VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE; - struct virtio_vsock_pkt *pkt; - struct scatterlist hdr, buf, *sgs[2]; + int total_len =3D VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE + VIRTIO_VSOCK_SKB_HEA= DROOM; + struct scatterlist pkt, *p; struct virtqueue *vq; + struct sk_buff *skb; int ret; =20 vq =3D vsock->vqs[VSOCK_VQ_RX]; =20 do { - pkt =3D kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + skb =3D virtio_vsock_alloc_skb(total_len, GFP_KERNEL); + if (!skb) break; =20 - pkt->buf =3D kmalloc(buf_len, GFP_KERNEL); - if (!pkt->buf) { - virtio_transport_free_pkt(pkt); + memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM); + sg_init_one(&pkt, virtio_vsock_hdr(skb), total_len); + p =3D &pkt; + ret =3D virtqueue_add_sgs(vq, &p, 0, 1, skb, GFP_KERNEL); + if (ret < 0) { + virtio_vsock_kfree_skb(skb); break; } =20 - pkt->buf_len =3D buf_len; - pkt->len =3D buf_len; - - sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); - sgs[0] =3D &hdr; - - sg_init_one(&buf, pkt->buf, buf_len); - sgs[1] =3D &buf; - ret =3D virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL); - if (ret) { - virtio_transport_free_pkt(pkt); - break; - } vsock->rx_buf_nr++; } while (vq->num_free); if (vsock->rx_buf_nr > vsock->rx_buf_max_nr) @@ -299,12 +263,12 @@ static void virtio_transport_tx_work(struct work_stru= ct *work) goto out; =20 do { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; unsigned int len; =20 virtqueue_disable_cb(vq); - while ((pkt =3D virtqueue_get_buf(vq, &len)) !=3D NULL) { - virtio_transport_free_pkt(pkt); + while ((skb =3D virtqueue_get_buf(vq, &len)) !=3D NULL) { + virtio_vsock_consume_skb(skb); added =3D true; } } while (!virtqueue_enable_cb(vq)); @@ -529,7 +493,7 @@ static void virtio_transport_rx_work(struct work_struct= *work) do { virtqueue_disable_cb(vq); for (;;) { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; unsigned int len; =20 if (!virtio_transport_more_replies(vsock)) { @@ -540,23 +504,22 @@ static void virtio_transport_rx_work(struct work_stru= ct *work) goto out; } =20 - pkt =3D virtqueue_get_buf(vq, &len); - if (!pkt) { + skb =3D virtqueue_get_buf(vq, &len); + if (!skb) break; - } =20 vsock->rx_buf_nr--; =20 /* Drop short/long packets */ - if (unlikely(len < sizeof(pkt->hdr) || - len > sizeof(pkt->hdr) + pkt->len)) { - virtio_transport_free_pkt(pkt); + if (unlikely(len < sizeof(struct virtio_vsock_hdr) || + len > virtio_vsock_skb_len(skb))) { + virtio_vsock_kfree_skb(skb); continue; } =20 - pkt->len =3D len - sizeof(pkt->hdr); - virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&virtio_transport, pkt); + virtio_vsock_skb_rx_put(skb); + virtio_transport_deliver_tap_pkt(skb); + virtio_transport_recv_pkt(&virtio_transport, skb); } } while (!virtqueue_enable_cb(vq)); =20 @@ -610,7 +573,7 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *v= sock) static void virtio_vsock_vqs_del(struct virtio_vsock *vsock) { struct virtio_device *vdev =3D vsock->vdev; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; =20 /* Reset all connected sockets when the VQs disappear */ vsock_for_each_connected_socket(&virtio_transport.transport, @@ -637,23 +600,16 @@ static void virtio_vsock_vqs_del(struct virtio_vsock = *vsock) virtio_reset_device(vdev); =20 mutex_lock(&vsock->rx_lock); - while ((pkt =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) - virtio_transport_free_pkt(pkt); + while ((skb =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) + virtio_vsock_kfree_skb(skb); mutex_unlock(&vsock->rx_lock); =20 mutex_lock(&vsock->tx_lock); - while ((pkt =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX]))) - virtio_transport_free_pkt(pkt); + while ((skb =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX]))) + virtio_vsock_kfree_skb(skb); mutex_unlock(&vsock->tx_lock); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - while (!list_empty(&vsock->send_pkt_list)) { - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue); =20 /* Delete virtqueues and flush outstanding callbacks if any */ vdev->config->del_vqs(vdev); @@ -690,8 +646,7 @@ static int virtio_vsock_probe(struct virtio_device *vde= v) mutex_init(&vsock->tx_lock); mutex_init(&vsock->rx_lock); mutex_init(&vsock->event_lock); - spin_lock_init(&vsock->send_pkt_list_lock); - INIT_LIST_HEAD(&vsock->send_pkt_list); + skb_queue_head_init(&vsock->send_pkt_queue); INIT_WORK(&vsock->rx_work, virtio_transport_rx_work); INIT_WORK(&vsock->tx_work, virtio_transport_tx_work); INIT_WORK(&vsock->event_work, virtio_transport_event_work); diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio= _transport_common.c index a9980e9b9304..489dcbf81f4f 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -37,53 +37,56 @@ virtio_transport_get_ops(struct vsock_sock *vsk) return container_of(t, struct virtio_transport, transport); } =20 -static struct virtio_vsock_pkt * -virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, +/* Returns a new packet on success, otherwise returns NULL. + * + * If NULL is returned, errp is set to a negative errno. + */ +static struct sk_buff * +virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, size_t len, u32 src_cid, u32 src_port, u32 dst_cid, u32 dst_port) { - struct virtio_vsock_pkt *pkt; + const size_t skb_len =3D VIRTIO_VSOCK_SKB_HEADROOM + len; + struct virtio_vsock_hdr *hdr; + struct sk_buff *skb; + void *payload; int err; =20 - pkt =3D kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + skb =3D virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); + if (!skb) return NULL; =20 - pkt->hdr.type =3D cpu_to_le16(info->type); - pkt->hdr.op =3D cpu_to_le16(info->op); - pkt->hdr.src_cid =3D cpu_to_le64(src_cid); - pkt->hdr.dst_cid =3D cpu_to_le64(dst_cid); - pkt->hdr.src_port =3D cpu_to_le32(src_port); - pkt->hdr.dst_port =3D cpu_to_le32(dst_port); - pkt->hdr.flags =3D cpu_to_le32(info->flags); - pkt->len =3D len; - pkt->hdr.len =3D cpu_to_le32(len); - pkt->reply =3D info->reply; - pkt->vsk =3D info->vsk; + hdr =3D virtio_vsock_hdr(skb); + hdr->type =3D cpu_to_le16(info->type); + hdr->op =3D cpu_to_le16(info->op); + hdr->src_cid =3D cpu_to_le64(src_cid); + hdr->dst_cid =3D cpu_to_le64(dst_cid); + hdr->src_port =3D cpu_to_le32(src_port); + hdr->dst_port =3D cpu_to_le32(dst_port); + hdr->flags =3D cpu_to_le32(info->flags); + hdr->len =3D cpu_to_le32(len); =20 if (info->msg && len > 0) { - pkt->buf =3D kmalloc(len, GFP_KERNEL); - if (!pkt->buf) - goto out_pkt; - - pkt->buf_len =3D len; - - err =3D memcpy_from_msg(pkt->buf, info->msg, len); + payload =3D skb_put(skb, len); + err =3D memcpy_from_msg(payload, info->msg, len); if (err) goto out; =20 if (msg_data_left(info->msg) =3D=3D 0 && info->type =3D=3D VIRTIO_VSOCK_TYPE_SEQPACKET) { - pkt->hdr.flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + hdr->flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); =20 if (info->msg->msg_flags & MSG_EOR) - pkt->hdr.flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + hdr->flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); } } =20 + if (info->reply) + virtio_vsock_skb_set_reply(skb); + trace_virtio_transport_alloc_pkt(src_cid, src_port, dst_cid, dst_port, len, @@ -91,19 +94,18 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info= *info, info->op, info->flags); =20 - return pkt; + return skb; =20 out: - kfree(pkt->buf); -out_pkt: - kfree(pkt); + virtio_vsock_kfree_skb(skb); return NULL; } =20 /* Packet capture */ static struct sk_buff *virtio_transport_build_skb(void *opaque) { - struct virtio_vsock_pkt *pkt =3D opaque; + struct virtio_vsock_hdr *pkt_hdr; + struct sk_buff *pkt =3D opaque; struct af_vsockmon_hdr *hdr; struct sk_buff *skb; size_t payload_len; @@ -113,10 +115,11 @@ static struct sk_buff *virtio_transport_build_skb(voi= d *opaque) * the payload length from the header and the buffer pointer taking * care of the offset in the original packet. */ - payload_len =3D le32_to_cpu(pkt->hdr.len); - payload_buf =3D pkt->buf + pkt->off; + pkt_hdr =3D virtio_vsock_hdr(pkt); + payload_len =3D pkt->len; + payload_buf =3D pkt->data; =20 - skb =3D alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len, + skb =3D alloc_skb(sizeof(*hdr) + sizeof(*pkt_hdr) + payload_len, GFP_ATOMIC); if (!skb) return NULL; @@ -124,16 +127,16 @@ static struct sk_buff *virtio_transport_build_skb(voi= d *opaque) hdr =3D skb_put(skb, sizeof(*hdr)); =20 /* pkt->hdr is little-endian so no need to byteswap here */ - hdr->src_cid =3D pkt->hdr.src_cid; - hdr->src_port =3D pkt->hdr.src_port; - hdr->dst_cid =3D pkt->hdr.dst_cid; - hdr->dst_port =3D pkt->hdr.dst_port; + hdr->src_cid =3D pkt_hdr->src_cid; + hdr->src_port =3D pkt_hdr->src_port; + hdr->dst_cid =3D pkt_hdr->dst_cid; + hdr->dst_port =3D pkt_hdr->dst_port; =20 hdr->transport =3D cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO); - hdr->len =3D cpu_to_le16(sizeof(pkt->hdr)); + hdr->len =3D cpu_to_le16(sizeof(*pkt_hdr)); memset(hdr->reserved, 0, sizeof(hdr->reserved)); =20 - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(pkt_hdr->op)) { case VIRTIO_VSOCK_OP_REQUEST: case VIRTIO_VSOCK_OP_RESPONSE: hdr->op =3D cpu_to_le16(AF_VSOCK_OP_CONNECT); @@ -154,7 +157,7 @@ static struct sk_buff *virtio_transport_build_skb(void = *opaque) break; } =20 - skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr)); + skb_put_data(skb, pkt_hdr, sizeof(*pkt_hdr)); =20 if (payload_len) { skb_put_data(skb, payload_buf, payload_len); @@ -163,13 +166,13 @@ static struct sk_buff *virtio_transport_build_skb(voi= d *opaque) return skb; } =20 -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt) +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb) { - if (pkt->tap_delivered) + if (virtio_vsock_skb_tap_delivered(skb)) return; =20 - vsock_deliver_tap(virtio_transport_build_skb, pkt); - pkt->tap_delivered =3D true; + vsock_deliver_tap(virtio_transport_build_skb, skb); + virtio_vsock_skb_set_tap_delivered(skb); } EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt); =20 @@ -192,8 +195,8 @@ static int virtio_transport_send_pkt_info(struct vsock_= sock *vsk, u32 src_cid, src_port, dst_cid, dst_port; const struct virtio_transport *t_ops; struct virtio_vsock_sock *vvs; - struct virtio_vsock_pkt *pkt; u32 pkt_len =3D info->pkt_len; + struct sk_buff *skb; =20 info->type =3D virtio_transport_get_type(sk_vsock(vsk)); =20 @@ -224,42 +227,47 @@ static int virtio_transport_send_pkt_info(struct vsoc= k_sock *vsk, if (pkt_len =3D=3D 0 && info->op =3D=3D VIRTIO_VSOCK_OP_RW) return pkt_len; =20 - pkt =3D virtio_transport_alloc_pkt(info, pkt_len, + skb =3D virtio_transport_alloc_skb(info, pkt_len, src_cid, src_port, dst_cid, dst_port); - if (!pkt) { + if (!skb) { virtio_transport_put_credit(vvs, pkt_len); return -ENOMEM; } =20 - virtio_transport_inc_tx_pkt(vvs, pkt); + virtio_transport_inc_tx_pkt(vvs, skb); =20 - return t_ops->send_pkt(pkt); + return t_ops->send_pkt(skb); } =20 static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - if (vvs->rx_bytes + pkt->len > vvs->buf_alloc) + if (vvs->rx_bytes + skb->len > vvs->buf_alloc) return false; =20 - vvs->rx_bytes +=3D pkt->len; + vvs->rx_bytes +=3D skb->len; return true; } =20 static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - vvs->rx_bytes -=3D pkt->len; - vvs->fwd_cnt +=3D pkt->len; + int len; + + len =3D skb_headroom(skb) - sizeof(struct virtio_vsock_hdr) - skb->len; + vvs->rx_bytes -=3D len; + vvs->fwd_cnt +=3D len; } =20 -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct vir= tio_vsock_pkt *pkt) +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_= buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); + spin_lock_bh(&vvs->rx_lock); vvs->last_fwd_cnt =3D vvs->fwd_cnt; - pkt->hdr.fwd_cnt =3D cpu_to_le32(vvs->fwd_cnt); - pkt->hdr.buf_alloc =3D cpu_to_le32(vvs->buf_alloc); + hdr->fwd_cnt =3D cpu_to_le32(vvs->fwd_cnt); + hdr->buf_alloc =3D cpu_to_le32(vvs->buf_alloc); spin_unlock_bh(&vvs->rx_lock); } EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt); @@ -303,29 +311,29 @@ virtio_transport_stream_do_peek(struct vsock_sock *vs= k, size_t len) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt; size_t bytes, total =3D 0, off; + struct sk_buff *skb, *tmp; int err =3D -EFAULT; =20 spin_lock_bh(&vvs->rx_lock); =20 - list_for_each_entry(pkt, &vvs->rx_queue, list) { - off =3D pkt->off; + skb_queue_walk_safe(&vvs->rx_queue, skb, tmp) { + off =3D 0; =20 if (total =3D=3D len) break; =20 - while (total < len && off < pkt->len) { + while (total < len && off < skb->len) { bytes =3D len - total; - if (bytes > pkt->len - off) - bytes =3D pkt->len - off; + if (bytes > skb->len - off) + bytes =3D skb->len - off; =20 /* sk_lock is held by caller so no one else can dequeue. * Unlock rx_lock since memcpy_to_msg() may sleep. */ spin_unlock_bh(&vvs->rx_lock); =20 - err =3D memcpy_to_msg(msg, pkt->buf + off, bytes); + err =3D memcpy_to_msg(msg, skb->data + off, bytes); if (err) goto out; =20 @@ -352,37 +360,38 @@ virtio_transport_stream_do_dequeue(struct vsock_sock = *vsk, size_t len) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt; size_t bytes, total =3D 0; - u32 free_space; + struct sk_buff *skb; int err =3D -EFAULT; + u32 free_space; =20 spin_lock_bh(&vvs->rx_lock); - while (total < len && !list_empty(&vvs->rx_queue)) { - pkt =3D list_first_entry(&vvs->rx_queue, - struct virtio_vsock_pkt, list); + while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) { + skb =3D __skb_dequeue(&vvs->rx_queue); =20 bytes =3D len - total; - if (bytes > pkt->len - pkt->off) - bytes =3D pkt->len - pkt->off; + if (bytes > skb->len) + bytes =3D skb->len; =20 /* sk_lock is held by caller so no one else can dequeue. * Unlock rx_lock since memcpy_to_msg() may sleep. */ spin_unlock_bh(&vvs->rx_lock); =20 - err =3D memcpy_to_msg(msg, pkt->buf + pkt->off, bytes); + err =3D memcpy_to_msg(msg, skb->data, bytes); if (err) goto out; =20 spin_lock_bh(&vvs->rx_lock); =20 total +=3D bytes; - pkt->off +=3D bytes; - if (pkt->off =3D=3D pkt->len) { - virtio_transport_dec_rx_pkt(vvs, pkt); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); + skb_pull(skb, bytes); + + if (skb->len =3D=3D 0) { + virtio_transport_dec_rx_pkt(vvs, skb); + virtio_vsock_consume_skb(skb); + } else { + __skb_queue_head(&vvs->rx_queue, skb); } } =20 @@ -414,10 +423,10 @@ static int virtio_transport_seqpacket_do_dequeue(stru= ct vsock_sock *vsk, int flags) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt; int dequeued_len =3D 0; size_t user_buf_len =3D msg_data_left(msg); bool msg_ready =3D false; + struct sk_buff *skb; =20 spin_lock_bh(&vvs->rx_lock); =20 @@ -427,13 +436,18 @@ static int virtio_transport_seqpacket_do_dequeue(stru= ct vsock_sock *vsk, } =20 while (!msg_ready) { - pkt =3D list_first_entry(&vvs->rx_queue, struct virtio_vsock_pkt, list); + struct virtio_vsock_hdr *hdr; + + skb =3D __skb_dequeue(&vvs->rx_queue); + if (!skb) + break; + hdr =3D virtio_vsock_hdr(skb); =20 if (dequeued_len >=3D 0) { size_t pkt_len; size_t bytes_to_copy; =20 - pkt_len =3D (size_t)le32_to_cpu(pkt->hdr.len); + pkt_len =3D (size_t)le32_to_cpu(hdr->len); bytes_to_copy =3D min(user_buf_len, pkt_len); =20 if (bytes_to_copy) { @@ -444,7 +458,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct= vsock_sock *vsk, */ spin_unlock_bh(&vvs->rx_lock); =20 - err =3D memcpy_to_msg(msg, pkt->buf, bytes_to_copy); + err =3D memcpy_to_msg(msg, skb->data, bytes_to_copy); if (err) { /* Copy of message failed. Rest of * fragments will be freed without copy. @@ -452,6 +466,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct= vsock_sock *vsk, dequeued_len =3D err; } else { user_buf_len -=3D bytes_to_copy; + skb_pull(skb, bytes_to_copy); } =20 spin_lock_bh(&vvs->rx_lock); @@ -461,17 +476,16 @@ static int virtio_transport_seqpacket_do_dequeue(stru= ct vsock_sock *vsk, dequeued_len +=3D pkt_len; } =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) { msg_ready =3D true; vvs->msg_count--; =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) msg->msg_flags |=3D MSG_EOR; } =20 - virtio_transport_dec_rx_pkt(vvs, pkt); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); + virtio_transport_dec_rx_pkt(vvs, skb); + virtio_vsock_kfree_skb(skb); } =20 spin_unlock_bh(&vvs->rx_lock); @@ -609,7 +623,7 @@ int virtio_transport_do_socket_init(struct vsock_sock *= vsk, =20 spin_lock_init(&vvs->rx_lock); spin_lock_init(&vvs->tx_lock); - INIT_LIST_HEAD(&vvs->rx_queue); + skb_queue_head_init(&vvs->rx_queue); =20 return 0; } @@ -806,16 +820,16 @@ void virtio_transport_destruct(struct vsock_sock *vsk) EXPORT_SYMBOL_GPL(virtio_transport_destruct); =20 static int virtio_transport_reset(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct virtio_vsock_pkt_info info =3D { .op =3D VIRTIO_VSOCK_OP_RST, - .reply =3D !!pkt, + .reply =3D !!skb, .vsk =3D vsk, }; =20 /* Send RST only if the original pkt is not a RST pkt */ - if (pkt && le16_to_cpu(pkt->hdr.op) =3D=3D VIRTIO_VSOCK_OP_RST) + if (skb && le16_to_cpu(virtio_vsock_hdr(skb)->op) =3D=3D VIRTIO_VSOCK_OP_= RST) return 0; =20 return virtio_transport_send_pkt_info(vsk, &info); @@ -825,29 +839,30 @@ static int virtio_transport_reset(struct vsock_sock *= vsk, * attempt was made to connect to a socket that does not exist. */ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - struct virtio_vsock_pkt *reply; + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info =3D { .op =3D VIRTIO_VSOCK_OP_RST, - .type =3D le16_to_cpu(pkt->hdr.type), + .type =3D le16_to_cpu(hdr->type), .reply =3D true, }; + struct sk_buff *reply; =20 /* Send RST only if the original pkt is not a RST pkt */ - if (le16_to_cpu(pkt->hdr.op) =3D=3D VIRTIO_VSOCK_OP_RST) + if (le16_to_cpu(hdr->op) =3D=3D VIRTIO_VSOCK_OP_RST) return 0; =20 - reply =3D virtio_transport_alloc_pkt(&info, 0, - le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port), - le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); + reply =3D virtio_transport_alloc_skb(&info, 0, + le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port), + le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); if (!reply) return -ENOMEM; =20 if (!t) { - virtio_transport_free_pkt(reply); + virtio_vsock_kfree_skb(reply); return -ENOTCONN; } =20 @@ -858,16 +873,11 @@ static int virtio_transport_reset_no_sock(const struc= t virtio_transport *t, static void virtio_transport_remove_sock(struct vsock_sock *vsk) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt, *tmp; =20 /* We don't need to take rx_lock, as the socket is closing and we are * removing it. */ - list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) { - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } - + virtio_vsock_skb_queue_purge(&vvs->rx_queue); vsock_remove_sock(vsk); } =20 @@ -981,13 +991,14 @@ EXPORT_SYMBOL_GPL(virtio_transport_release); =20 static int virtio_transport_recv_connecting(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct vsock_sock *vsk =3D vsock_sk(sk); - int err; int skerr; + int err; =20 - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RESPONSE: sk->sk_state =3D TCP_ESTABLISHED; sk->sk_socket->state =3D SS_CONNECTED; @@ -1008,7 +1019,7 @@ virtio_transport_recv_connecting(struct sock *sk, return 0; =20 destroy: - virtio_transport_reset(vsk, pkt); + virtio_transport_reset(vsk, skb); sk->sk_state =3D TCP_CLOSE; sk->sk_err =3D skerr; sk_error_report(sk); @@ -1017,34 +1028,37 @@ virtio_transport_recv_connecting(struct sock *sk, =20 static void virtio_transport_recv_enqueue(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct virtio_vsock_sock *vvs =3D vsk->trans; bool can_enqueue, free_pkt =3D false; + struct virtio_vsock_hdr *hdr; + u32 len; =20 - pkt->len =3D le32_to_cpu(pkt->hdr.len); - pkt->off =3D 0; + hdr =3D virtio_vsock_hdr(skb); + len =3D le32_to_cpu(hdr->len); =20 spin_lock_bh(&vvs->rx_lock); =20 - can_enqueue =3D virtio_transport_inc_rx_pkt(vvs, pkt); + can_enqueue =3D virtio_transport_inc_rx_pkt(vvs, skb); if (!can_enqueue) { free_pkt =3D true; goto out; } =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) vvs->msg_count++; =20 /* Try to copy small packets into the buffer of last packet queued, * to avoid wasting memory queueing the entire buffer with a small * payload. */ - if (pkt->len <=3D GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) { - struct virtio_vsock_pkt *last_pkt; + if (len <=3D GOOD_COPY_LEN && !skb_queue_empty_lockless(&vvs->rx_queue)) { + struct virtio_vsock_hdr *last_hdr; + struct sk_buff *last_skb; =20 - last_pkt =3D list_last_entry(&vvs->rx_queue, - struct virtio_vsock_pkt, list); + last_skb =3D skb_peek_tail(&vvs->rx_queue); + last_hdr =3D virtio_vsock_hdr(last_skb); =20 /* If there is space in the last packet queued, we copy the * new packet in its buffer. We avoid this if the last packet @@ -1052,35 +1066,35 @@ virtio_transport_recv_enqueue(struct vsock_sock *vs= k, * delimiter of SEQPACKET message, so 'pkt' is the first packet * of a new message. */ - if ((pkt->len <=3D last_pkt->buf_len - last_pkt->len) && - !(le32_to_cpu(last_pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM)) { - memcpy(last_pkt->buf + last_pkt->len, pkt->buf, - pkt->len); - last_pkt->len +=3D pkt->len; + if (skb->len < skb_tailroom(last_skb) && + !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) { + memcpy(skb_put(last_skb, skb->len), skb->data, skb->len); free_pkt =3D true; - last_pkt->hdr.flags |=3D pkt->hdr.flags; + last_hdr->flags |=3D hdr->flags; + last_hdr->len =3D cpu_to_le32(last_skb->len); goto out; } } =20 - list_add_tail(&pkt->list, &vvs->rx_queue); + __skb_queue_tail(&vvs->rx_queue, skb); =20 out: spin_unlock_bh(&vvs->rx_lock); if (free_pkt) - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); } =20 static int virtio_transport_recv_connected(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct vsock_sock *vsk =3D vsock_sk(sk); int err =3D 0; =20 - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RW: - virtio_transport_recv_enqueue(vsk, pkt); + virtio_transport_recv_enqueue(vsk, skb); vsock_data_ready(sk); return err; case VIRTIO_VSOCK_OP_CREDIT_REQUEST: @@ -1090,18 +1104,17 @@ virtio_transport_recv_connected(struct sock *sk, sk->sk_write_space(sk); break; case VIRTIO_VSOCK_OP_SHUTDOWN: - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_RCV) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_RCV) vsk->peer_shutdown |=3D RCV_SHUTDOWN; - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) vsk->peer_shutdown |=3D SEND_SHUTDOWN; if (vsk->peer_shutdown =3D=3D SHUTDOWN_MASK && vsock_stream_has_data(vsk) <=3D 0 && !sock_flag(sk, SOCK_DONE)) { (void)virtio_transport_reset(vsk, NULL); - virtio_transport_do_close(vsk, true); } - if (le32_to_cpu(pkt->hdr.flags)) + if (le32_to_cpu(virtio_vsock_hdr(skb)->flags)) sk->sk_state_change(sk); break; case VIRTIO_VSOCK_OP_RST: @@ -1112,28 +1125,30 @@ virtio_transport_recv_connected(struct sock *sk, break; } =20 - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); return err; } =20 static void virtio_transport_recv_disconnecting(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct vsock_sock *vsk =3D vsock_sk(sk); =20 - if (le16_to_cpu(pkt->hdr.op) =3D=3D VIRTIO_VSOCK_OP_RST) + if (le16_to_cpu(hdr->op) =3D=3D VIRTIO_VSOCK_OP_RST) virtio_transport_do_close(vsk, true); } =20 static int virtio_transport_send_response(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info =3D { .op =3D VIRTIO_VSOCK_OP_RESPONSE, - .remote_cid =3D le64_to_cpu(pkt->hdr.src_cid), - .remote_port =3D le32_to_cpu(pkt->hdr.src_port), + .remote_cid =3D le64_to_cpu(hdr->src_cid), + .remote_port =3D le32_to_cpu(hdr->src_port), .reply =3D true, .vsk =3D vsk, }; @@ -1142,8 +1157,9 @@ virtio_transport_send_response(struct vsock_sock *vsk, } =20 static bool virtio_transport_space_update(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct vsock_sock *vsk =3D vsock_sk(sk); struct virtio_vsock_sock *vvs =3D vsk->trans; bool space_available; @@ -1158,8 +1174,8 @@ static bool virtio_transport_space_update(struct sock= *sk, =20 /* buf_alloc and fwd_cnt is always included in the hdr */ spin_lock_bh(&vvs->tx_lock); - vvs->peer_buf_alloc =3D le32_to_cpu(pkt->hdr.buf_alloc); - vvs->peer_fwd_cnt =3D le32_to_cpu(pkt->hdr.fwd_cnt); + vvs->peer_buf_alloc =3D le32_to_cpu(hdr->buf_alloc); + vvs->peer_fwd_cnt =3D le32_to_cpu(hdr->fwd_cnt); space_available =3D virtio_transport_has_space(vsk); spin_unlock_bh(&vvs->tx_lock); return space_available; @@ -1167,27 +1183,28 @@ static bool virtio_transport_space_update(struct so= ck *sk, =20 /* Handle server socket */ static int -virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt, +virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, struct virtio_transport *t) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct vsock_sock *vsk =3D vsock_sk(sk); struct vsock_sock *vchild; struct sock *child; int ret; =20 - if (le16_to_cpu(pkt->hdr.op) !=3D VIRTIO_VSOCK_OP_REQUEST) { - virtio_transport_reset_no_sock(t, pkt); + if (le16_to_cpu(hdr->op) !=3D VIRTIO_VSOCK_OP_REQUEST) { + virtio_transport_reset_no_sock(t, skb); return -EINVAL; } =20 if (sk_acceptq_is_full(sk)) { - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); return -ENOMEM; } =20 child =3D vsock_create_connected(sk); if (!child) { - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); return -ENOMEM; } =20 @@ -1198,10 +1215,10 @@ virtio_transport_recv_listen(struct sock *sk, struc= t virtio_vsock_pkt *pkt, child->sk_state =3D TCP_ESTABLISHED; =20 vchild =3D vsock_sk(child); - vsock_addr_init(&vchild->local_addr, le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port)); - vsock_addr_init(&vchild->remote_addr, le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); + vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port)); + vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); =20 ret =3D vsock_assign_transport(vchild, vsk); /* Transport assigned (looking at remote_addr) must be the same @@ -1209,17 +1226,17 @@ virtio_transport_recv_listen(struct sock *sk, struc= t virtio_vsock_pkt *pkt, */ if (ret || vchild->transport !=3D &t->transport) { release_sock(child); - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); sock_put(child); return ret; } =20 - if (virtio_transport_space_update(child, pkt)) + if (virtio_transport_space_update(child, skb)) child->sk_write_space(child); =20 vsock_insert_connected(vchild); vsock_enqueue_accept(sk, child); - virtio_transport_send_response(vchild, pkt); + virtio_transport_send_response(vchild, skb); =20 release_sock(child); =20 @@ -1237,29 +1254,30 @@ static bool virtio_transport_valid_type(u16 type) * lock. */ void virtio_transport_recv_pkt(struct virtio_transport *t, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D virtio_vsock_hdr(skb); struct sockaddr_vm src, dst; struct vsock_sock *vsk; struct sock *sk; bool space_available; =20 - vsock_addr_init(&src, le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); - vsock_addr_init(&dst, le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port)); + vsock_addr_init(&src, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); + vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port)); =20 trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port, dst.svm_cid, dst.svm_port, - le32_to_cpu(pkt->hdr.len), - le16_to_cpu(pkt->hdr.type), - le16_to_cpu(pkt->hdr.op), - le32_to_cpu(pkt->hdr.flags), - le32_to_cpu(pkt->hdr.buf_alloc), - le32_to_cpu(pkt->hdr.fwd_cnt)); - - if (!virtio_transport_valid_type(le16_to_cpu(pkt->hdr.type))) { - (void)virtio_transport_reset_no_sock(t, pkt); + le32_to_cpu(hdr->len), + le16_to_cpu(hdr->type), + le16_to_cpu(hdr->op), + le32_to_cpu(hdr->flags), + le32_to_cpu(hdr->buf_alloc), + le32_to_cpu(hdr->fwd_cnt)); + + if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) { + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } =20 @@ -1270,13 +1288,13 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, if (!sk) { sk =3D vsock_find_bound_socket(&dst); if (!sk) { - (void)virtio_transport_reset_no_sock(t, pkt); + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } } =20 - if (virtio_transport_get_type(sk) !=3D le16_to_cpu(pkt->hdr.type)) { - (void)virtio_transport_reset_no_sock(t, pkt); + if (virtio_transport_get_type(sk) !=3D le16_to_cpu(hdr->type)) { + (void)virtio_transport_reset_no_sock(t, skb); sock_put(sk); goto free_pkt; } @@ -1287,13 +1305,13 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, =20 /* Check if sk has been closed before lock_sock */ if (sock_flag(sk, SOCK_DONE)) { - (void)virtio_transport_reset_no_sock(t, pkt); + (void)virtio_transport_reset_no_sock(t, skb); release_sock(sk); sock_put(sk); goto free_pkt; } =20 - space_available =3D virtio_transport_space_update(sk, pkt); + space_available =3D virtio_transport_space_update(sk, skb); =20 /* Update CID in case it has changed after a transport reset event */ if (vsk->local_addr.svm_cid !=3D VMADDR_CID_ANY) @@ -1304,23 +1322,23 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, =20 switch (sk->sk_state) { case TCP_LISTEN: - virtio_transport_recv_listen(sk, pkt, t); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_listen(sk, skb, t); + virtio_vsock_kfree_skb(skb); break; case TCP_SYN_SENT: - virtio_transport_recv_connecting(sk, pkt); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_connecting(sk, skb); + virtio_vsock_kfree_skb(skb); break; case TCP_ESTABLISHED: - virtio_transport_recv_connected(sk, pkt); + virtio_transport_recv_connected(sk, skb); break; case TCP_CLOSING: - virtio_transport_recv_disconnecting(sk, pkt); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_disconnecting(sk, skb); + virtio_vsock_kfree_skb(skb); break; default: - (void)virtio_transport_reset_no_sock(t, pkt); - virtio_transport_free_pkt(pkt); + (void)virtio_transport_reset_no_sock(t, skb); + virtio_vsock_kfree_skb(skb); break; } =20 @@ -1333,16 +1351,42 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, return; =20 free_pkt: - virtio_transport_free_pkt(pkt); + virtio_vsock_kfree_skb(skb); } EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); =20 -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) +/* Remove skbs found in a queue that have a vsk that matches. + * + * Each skb is freed. + * + * Returns the count of skbs that were reply packets. + */ +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue) { - kvfree(pkt->buf); - kfree(pkt); + struct sk_buff_head freeme; + struct sk_buff *skb, *tmp; + int cnt =3D 0; + + skb_queue_head_init(&freeme); + + spin_lock_bh(&queue->lock); + skb_queue_walk_safe(queue, skb, tmp) { + if (vsock_sk(skb->sk) !=3D vsk) + continue; + + __skb_unlink(skb, queue); + __skb_queue_tail(&freeme, skb); + + if (virtio_vsock_skb_reply(skb)) + cnt++; + } + spin_unlock_bh(&queue->lock); + + __virtio_vsock_skb_queue_purge(&freeme); + + return cnt; } -EXPORT_SYMBOL_GPL(virtio_transport_free_pkt); +EXPORT_SYMBOL_GPL(virtio_transport_purge_skbs); =20 MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Asias He"); diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 169a8cf65b39..e57394579146 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -16,7 +16,7 @@ struct vsock_loopback { struct workqueue_struct *workqueue; =20 spinlock_t pkt_list_lock; /* protects pkt_list */ - struct list_head pkt_list; + struct sk_buff_head pkt_queue; struct work_struct pkt_work; }; =20 @@ -27,13 +27,13 @@ static u32 vsock_loopback_get_local_cid(void) return VMADDR_CID_LOCAL; } =20 -static int vsock_loopback_send_pkt(struct virtio_vsock_pkt *pkt) +static int vsock_loopback_send_pkt(struct sk_buff *skb) { struct vsock_loopback *vsock =3D &the_vsock_loopback; - int len =3D pkt->len; + int len =3D skb->len; =20 spin_lock_bh(&vsock->pkt_list_lock); - list_add_tail(&pkt->list, &vsock->pkt_list); + skb_queue_tail(&vsock->pkt_queue, skb); spin_unlock_bh(&vsock->pkt_list_lock); =20 queue_work(vsock->workqueue, &vsock->pkt_work); @@ -44,21 +44,8 @@ static int vsock_loopback_send_pkt(struct virtio_vsock_p= kt *pkt) static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) { struct vsock_loopback *vsock =3D &the_vsock_loopback; - struct virtio_vsock_pkt *pkt, *n; - LIST_HEAD(freeme); =20 - spin_lock_bh(&vsock->pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->pkt_list, list) { - if (pkt->vsk !=3D vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + virtio_transport_purge_skbs(vsk, &vsock->pkt_queue); =20 return 0; } @@ -121,20 +108,18 @@ static void vsock_loopback_work(struct work_struct *w= ork) { struct vsock_loopback *vsock =3D container_of(work, struct vsock_loopback, pkt_work); - LIST_HEAD(pkts); + struct sk_buff_head pkts; + struct sk_buff *skb; + + skb_queue_head_init(&pkts); =20 spin_lock_bh(&vsock->pkt_list_lock); - list_splice_init(&vsock->pkt_list, &pkts); + skb_queue_splice_init(&vsock->pkt_queue, &pkts); spin_unlock_bh(&vsock->pkt_list_lock); =20 - while (!list_empty(&pkts)) { - struct virtio_vsock_pkt *pkt; - - pkt =3D list_first_entry(&pkts, struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - - virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&loopback_transport, pkt); + while ((skb =3D skb_dequeue(&pkts))) { + virtio_transport_deliver_tap_pkt(skb); + virtio_transport_recv_pkt(&loopback_transport, skb); } } =20 @@ -148,7 +133,7 @@ static int __init vsock_loopback_init(void) return -ENOMEM; =20 spin_lock_init(&vsock->pkt_list_lock); - INIT_LIST_HEAD(&vsock->pkt_list); + skb_queue_head_init(&vsock->pkt_queue); INIT_WORK(&vsock->pkt_work, vsock_loopback_work); =20 ret =3D vsock_core_register(&loopback_transport.transport, @@ -166,19 +151,13 @@ static int __init vsock_loopback_init(void) static void __exit vsock_loopback_exit(void) { struct vsock_loopback *vsock =3D &the_vsock_loopback; - struct virtio_vsock_pkt *pkt; =20 vsock_core_unregister(&loopback_transport.transport); =20 flush_work(&vsock->pkt_work); =20 spin_lock_bh(&vsock->pkt_list_lock); - while (!list_empty(&vsock->pkt_list)) { - pkt =3D list_first_entry(&vsock->pkt_list, - struct virtio_vsock_pkt, list); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + virtio_vsock_skb_queue_purge(&vsock->pkt_queue); spin_unlock_bh(&vsock->pkt_list_lock); =20 destroy_workqueue(vsock->workqueue); --=20 2.35.1