From nobody Sat Apr 11 05:14:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 139F7C25B0E for ; Mon, 15 Aug 2022 17:56:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232197AbiHOR4t (ORCPT ); Mon, 15 Aug 2022 13:56:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231977AbiHOR4j (ORCPT ); Mon, 15 Aug 2022 13:56:39 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37D342872A; Mon, 15 Aug 2022 10:56:36 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id r22so7103153pgm.5; Mon, 15 Aug 2022 10:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc; bh=YiCZsVoGDAV56MePPjO85FdtG4i62YMtrOCNdATmQrs=; b=jvlFkKFoEr3YDr8Ia87JeDzzautUH8T3Fs3Gi7sROAYGXibbUge9A9R3yP7Sz5YpV6 nG0RWhIPKKJUUM1t9pfLxQCoOn8w7izRl0ekp94LtnUCVoYFYyQcMlONB3iPPIZrWhmc bMyxaPu2WZlO250p5q+YkBZRKWol2FoMkQxgcuD6ltRCcexe9agckhQlxh6nbvoQ7bO9 Fkc3L4q+BEDWyyxWPAbQIciHgNpu3kZQDB5ScHKiPdA0WiZ+I2FZ6nDsKNEw9aPRh7E/ Z+eTAEHqTzD5i2tbccYA97jQJFIixD9fkFW20uAKDizoDK8004E1g0Uq5KGm06ZnszOC SGAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc; bh=YiCZsVoGDAV56MePPjO85FdtG4i62YMtrOCNdATmQrs=; b=VdoQZ+EGEWNZAWTDacGvGX5eBNXjjQ6B7ceIbSc767VLeuYp0WioFw+S7DObwvsFRF MN/ywp9SZtdPkBo8OsNswF7PosYAnqfnDQpSCtpjw7rSeA7LLaJ2DNHVh9UZi2v7vOmH Y9RfWp9TxnoHqMk3EnMZ7M0Z4Y8vX6gK4l/Cw+IzxPGADjPp1AELnz53QMV3huF7EGWp ZJTZqvRHn8TTi+uUVQWXLKH2Z9FFLGDq15TWwv3YH8kQohrPL6d6JLqdeUE0VkuL3hKw KRtydl+/fWZkIS6wRUwRK+8HBjtNJuK30yel1OWBLsqAeX4FFvulW0zLCgp3zNu12AWp kIIw== X-Gm-Message-State: ACgBeo2gWrcgYW07XWCInLuqD86CIKpm0x1zPOVjqxg7lvr/lDfoJc+T mxJJRXXK0zecNcvZCOnRG+0= X-Google-Smtp-Source: AA6agR6eBgPOzzWFDA6psl82y228WkHLtOIXh9JXqiR2TAGwU3owUPOhKVfZx1V8mSxZxWeTV0qTTQ== X-Received: by 2002:a63:80c8:0:b0:421:9052:8259 with SMTP id j191-20020a6380c8000000b0042190528259mr14577837pgd.14.1660586195396; Mon, 15 Aug 2022 10:56:35 -0700 (PDT) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id o5-20020a170902d4c500b0016d6963cb12sm7299935plg.304.2022.08.15.10.56.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2022 10:56:34 -0700 (PDT) Sender: Bobby Eshleman From: Bobby Eshleman X-Google-Original-From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/6] vsock: replace virtio_vsock_pkt with sk_buff Date: Mon, 15 Aug 2022 10:56:04 -0700 Message-Id: <65d117ddc530d12a6d47fcc45b38891465a90d9f.1660362668.git.bobby.eshleman@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch replaces virtio_vsock_pkt with sk_buff. The benefit of this series includes: * The bug reported @ https://bugzilla.redhat.com/show_bug.cgi?id=3D2009935 does not present itself when reasonable sk_sndbuf thresholds are set. * Using sock_alloc_send_skb() teaches VSOCK to respect sk_sndbuf for tunability. * Eliminates copying for vsock_deliver_tap(). * sk_buff is required for future improvements, such as using socket map. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 214 +++++------ include/linux/virtio_vsock.h | 60 ++- net/vmw_vsock/af_vsock.c | 1 + net/vmw_vsock/virtio_transport.c | 212 +++++----- net/vmw_vsock/virtio_transport_common.c | 491 ++++++++++++------------ net/vmw_vsock/vsock_loopback.c | 51 +-- 6 files changed, 517 insertions(+), 512 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 368330417bde..f8601d93d94d 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -51,8 +51,7 @@ struct vhost_vsock { struct hlist_node hash; =20 struct vhost_work send_pkt_work; - spinlock_t send_pkt_list_lock; - struct list_head send_pkt_list; /* host->guest pending packets */ + struct sk_buff_head send_pkt_queue; /* host->guest pending packets */ =20 atomic_t queued_replies; =20 @@ -108,7 +107,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, vhost_disable_notify(&vsock->dev, vq); =20 do { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; + struct virtio_vsock_hdr *hdr; struct iov_iter iov_iter; unsigned out, in; size_t nbytes; @@ -116,31 +116,22 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, int head; u32 flags_to_restore =3D 0; =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - if (list_empty(&vsock->send_pkt_list)) { - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb =3D skb_dequeue(&vsock->send_pkt_queue); + + if (!skb) { vhost_enable_notify(&vsock->dev, vq); break; } =20 - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - head =3D vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, NULL); if (head < 0) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb_queue_head(&vsock->send_pkt_queue, skb); break; } =20 if (head =3D=3D vq->num) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb_queue_head(&vsock->send_pkt_queue, skb); =20 /* We cannot finish yet if more buffers snuck in while * re-enabling notify. @@ -153,26 +144,27 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, } =20 if (out) { - virtio_transport_free_pkt(pkt); + kfree_skb(skb); vq_err(vq, "Expected 0 output buffers, got %u\n", out); break; } =20 iov_len =3D iov_length(&vq->iov[out], in); - if (iov_len < sizeof(pkt->hdr)) { - virtio_transport_free_pkt(pkt); + if (iov_len < sizeof(*hdr)) { + kfree_skb(skb); vq_err(vq, "Buffer len [%zu] too small\n", iov_len); break; } =20 iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len); - payload_len =3D pkt->len - pkt->off; + payload_len =3D skb->len - vsock_metadata(skb)->off; + hdr =3D vsock_hdr(skb); =20 /* If the packet is greater than the space available in the * buffer, we split it using multiple buffers. */ - if (payload_len > iov_len - sizeof(pkt->hdr)) { - payload_len =3D iov_len - sizeof(pkt->hdr); + if (payload_len > iov_len - sizeof(*hdr)) { + payload_len =3D iov_len - sizeof(*hdr); =20 /* As we are copying pieces of large packet's buffer to * small rx buffers, headers of packets in rx queue are @@ -185,31 +177,31 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, * bits set. After initialized header will be copied to * rx buffer, these required bits will be restored. */ - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { - pkt->hdr.flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) { + hdr->flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); flags_to_restore |=3D VIRTIO_VSOCK_SEQ_EOM; =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) { - pkt->hdr.flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) { + hdr->flags &=3D ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); flags_to_restore |=3D VIRTIO_VSOCK_SEQ_EOR; } } } =20 /* Set the correct length in the header */ - pkt->hdr.len =3D cpu_to_le32(payload_len); + hdr->len =3D cpu_to_le32(payload_len); =20 - nbytes =3D copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); - if (nbytes !=3D sizeof(pkt->hdr)) { - virtio_transport_free_pkt(pkt); + nbytes =3D copy_to_iter(hdr, sizeof(*hdr), &iov_iter); + if (nbytes !=3D sizeof(*hdr)) { + kfree_skb(skb); vq_err(vq, "Faulted on copying pkt hdr\n"); break; } =20 - nbytes =3D copy_to_iter(pkt->buf + pkt->off, payload_len, + nbytes =3D copy_to_iter(skb->data + vsock_metadata(skb)->off, payload_le= n, &iov_iter); if (nbytes !=3D payload_len) { - virtio_transport_free_pkt(pkt); + kfree_skb(skb); vq_err(vq, "Faulted on copying pkt buf\n"); break; } @@ -217,31 +209,28 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, /* Deliver to monitoring devices all packets that we * will transmit. */ - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); =20 - vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len); + vhost_add_used(vq, head, sizeof(*hdr) + payload_len); added =3D true; =20 - pkt->off +=3D payload_len; + vsock_metadata(skb)->off +=3D payload_len; total_len +=3D payload_len; =20 /* If we didn't send all the payload we can requeue the packet * to send it with the next available buffer. */ - if (pkt->off < pkt->len) { - pkt->hdr.flags |=3D cpu_to_le32(flags_to_restore); + if (vsock_metadata(skb)->off < skb->len) { + hdr->flags |=3D cpu_to_le32(flags_to_restore); =20 - /* We are queueing the same virtio_vsock_pkt to handle + /* We are queueing the same skb to handle * the remaining bytes, and we want to deliver it * to monitoring devices in the next iteration. */ - pkt->tap_delivered =3D false; - - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + vsock_metadata(skb)->flags &=3D ~VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVE= RED; + skb_queue_head(&vsock->send_pkt_queue, skb); } else { - if (pkt->reply) { + if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY) { int val; =20 val =3D atomic_dec_return(&vsock->queued_replies); @@ -253,7 +242,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, restart_tx =3D true; } =20 - virtio_transport_free_pkt(pkt); + consume_skb(skb); } } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); if (added) @@ -278,28 +267,26 @@ static void vhost_transport_send_pkt_work(struct vhos= t_work *work) } =20 static int -vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt) +vhost_transport_send_pkt(struct sk_buff *skb) { struct vhost_vsock *vsock; - int len =3D pkt->len; + int len =3D skb->len; + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); =20 rcu_read_lock(); =20 /* Find the vhost_vsock according to guest context id */ - vsock =3D vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid)); + vsock =3D vhost_vsock_get(le64_to_cpu(hdr->dst_cid)); if (!vsock) { rcu_read_unlock(); - virtio_transport_free_pkt(pkt); + kfree_skb(skb); return -ENODEV; } =20 - if (pkt->reply) + if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY) atomic_inc(&vsock->queued_replies); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add_tail(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + skb_queue_tail(&vsock->send_pkt_queue, skb); vhost_work_queue(&vsock->dev, &vsock->send_pkt_work); =20 rcu_read_unlock(); @@ -310,10 +297,8 @@ static int vhost_transport_cancel_pkt(struct vsock_sock *vsk) { struct vhost_vsock *vsock; - struct virtio_vsock_pkt *pkt, *n; int cnt =3D 0; int ret =3D -ENODEV; - LIST_HEAD(freeme); =20 rcu_read_lock(); =20 @@ -322,20 +307,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) if (!vsock) goto out; =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) { - if (pkt->vsk !=3D vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - if (pkt->reply) - cnt++; - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + cnt =3D virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue); =20 if (cnt) { struct vhost_virtqueue *tx_vq =3D &vsock->vqs[VSOCK_VQ_TX]; @@ -352,11 +324,12 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) return ret; } =20 -static struct virtio_vsock_pkt * -vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, +static struct sk_buff * +vhost_vsock_alloc_skb(struct vhost_virtqueue *vq, unsigned int out, unsigned int in) { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; + struct virtio_vsock_hdr *hdr; struct iov_iter iov_iter; size_t nbytes; size_t len; @@ -366,50 +339,49 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } =20 - pkt =3D kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + len =3D iov_length(vq->iov, out); + + /* len contains both payload and hdr, so only add additional space for me= tadata */ + skb =3D alloc_skb(len + sizeof(struct virtio_vsock_metadata), GFP_KERNEL); + if (!skb) return NULL; =20 - len =3D iov_length(vq->iov, out); + memset(skb->head, 0, sizeof(struct virtio_vsock_metadata)); + virtio_vsock_skb_reserve(skb); iov_iter_init(&iov_iter, WRITE, vq->iov, out, len); =20 - nbytes =3D copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); - if (nbytes !=3D sizeof(pkt->hdr)) { + hdr =3D vsock_hdr(skb); + nbytes =3D copy_from_iter(hdr, sizeof(*hdr), &iov_iter); + if (nbytes !=3D sizeof(*hdr)) { vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n", - sizeof(pkt->hdr), nbytes); - kfree(pkt); + sizeof(*hdr), nbytes); + kfree_skb(skb); return NULL; } =20 - pkt->len =3D le32_to_cpu(pkt->hdr.len); + len =3D le32_to_cpu(hdr->len); =20 /* No payload */ - if (!pkt->len) - return pkt; + if (!len) + return skb; =20 /* The pkt is too big */ - if (pkt->len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { - kfree(pkt); + if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { + kfree_skb(skb); return NULL; } =20 - pkt->buf =3D kmalloc(pkt->len, GFP_KERNEL); - if (!pkt->buf) { - kfree(pkt); - return NULL; - } + virtio_vsock_skb_rx_put(skb); =20 - pkt->buf_len =3D pkt->len; - - nbytes =3D copy_from_iter(pkt->buf, pkt->len, &iov_iter); - if (nbytes !=3D pkt->len) { - vq_err(vq, "Expected %u byte payload, got %zu bytes\n", - pkt->len, nbytes); - virtio_transport_free_pkt(pkt); + nbytes =3D copy_from_iter(skb->data, len, &iov_iter); + if (nbytes !=3D len) { + vq_err(vq, "Expected %zu byte payload, got %zu bytes\n", + len, nbytes); + kfree_skb(skb); return NULL; } =20 - return pkt; + return skb; } =20 /* Is there space left for replies to rx packets? */ @@ -496,7 +468,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_wor= k *work) poll.work); struct vhost_vsock *vsock =3D container_of(vq->dev, struct vhost_vsock, dev); - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; int head, pkts =3D 0, total_len =3D 0; unsigned int out, in; bool added =3D false; @@ -511,6 +483,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_wor= k *work) =20 vhost_disable_notify(&vsock->dev, vq); do { + struct virtio_vsock_hdr *hdr; + u32 len; + if (!vhost_vsock_more_replies(vsock)) { /* Stop tx until the device processes already * pending replies. Leave tx virtqueue @@ -532,26 +507,29 @@ static void vhost_vsock_handle_tx_kick(struct vhost_w= ork *work) break; } =20 - pkt =3D vhost_vsock_alloc_pkt(vq, out, in); - if (!pkt) { - vq_err(vq, "Faulted on pkt\n"); + skb =3D vhost_vsock_alloc_skb(vq, out, in); + if (!skb) continue; - } =20 - total_len +=3D sizeof(pkt->hdr) + pkt->len; + len =3D skb->len; =20 /* Deliver to monitoring devices all received packets */ - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); + + hdr =3D vsock_hdr(skb); =20 /* Only accept correctly addressed packets */ - if (le64_to_cpu(pkt->hdr.src_cid) =3D=3D vsock->guest_cid && - le64_to_cpu(pkt->hdr.dst_cid) =3D=3D + if (le64_to_cpu(hdr->src_cid) =3D=3D vsock->guest_cid && + le64_to_cpu(hdr->dst_cid) =3D=3D vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, pkt); + virtio_transport_recv_pkt(&vhost_transport, skb); else - virtio_transport_free_pkt(pkt); + kfree_skb(skb); + =20 - vhost_add_used(vq, head, 0); + len +=3D sizeof(*hdr); + vhost_add_used(vq, head, len); + total_len +=3D len; added =3D true; } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); =20 @@ -693,8 +671,7 @@ static int vhost_vsock_dev_open(struct inode *inode, st= ruct file *file) VHOST_VSOCK_WEIGHT, true, NULL); =20 file->private_data =3D vsock; - spin_lock_init(&vsock->send_pkt_list_lock); - INIT_LIST_HEAD(&vsock->send_pkt_list); + skb_queue_head_init(&vsock->send_pkt_queue); vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work); return 0; =20 @@ -760,16 +737,7 @@ static int vhost_vsock_dev_release(struct inode *inode= , struct file *file) vhost_vsock_flush(vsock); vhost_dev_stop(&vsock->dev); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - while (!list_empty(&vsock->send_pkt_list)) { - struct virtio_vsock_pkt *pkt; - - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - virtio_transport_free_pkt(pkt); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb_queue_purge(&vsock->send_pkt_queue); =20 vhost_dev_cleanup(&vsock->dev); kfree(vsock->dev.vqs); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 35d7eedb5e8e..17ed01466875 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -4,9 +4,43 @@ =20 #include #include +#include #include #include =20 +enum virtio_vsock_metadata_flags { + VIRTIO_VSOCK_METADATA_FLAGS_REPLY =3D BIT(0), + VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED =3D BIT(1), +}; + +/* Used only by the virtio/vhost vsock drivers, not related to protocol */ +struct virtio_vsock_metadata { + size_t off; + enum virtio_vsock_metadata_flags flags; +}; + +#define vsock_hdr(skb) \ + ((struct virtio_vsock_hdr *) \ + ((void *)skb->head + sizeof(struct virtio_vsock_metadata))) + +#define vsock_metadata(skb) \ + ((struct virtio_vsock_metadata *)skb->head) + +#define virtio_vsock_skb_reserve(skb) \ + skb_reserve(skb, \ + sizeof(struct virtio_vsock_metadata) + \ + sizeof(struct virtio_vsock_hdr)) + +static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb) +{ + u32 len; + + len =3D le32_to_cpu(vsock_hdr(skb)->len); + + if (len > 0) + skb_put(skb, len); +} + #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4) #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64) @@ -35,23 +69,10 @@ struct virtio_vsock_sock { u32 last_fwd_cnt; u32 rx_bytes; u32 buf_alloc; - struct list_head rx_queue; + struct sk_buff_head rx_queue; u32 msg_count; }; =20 -struct virtio_vsock_pkt { - struct virtio_vsock_hdr hdr; - struct list_head list; - /* socket refcnt not held, only use for cancellation */ - struct vsock_sock *vsk; - void *buf; - u32 buf_len; - u32 len; - u32 off; - bool reply; - bool tap_delivered; -}; - struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; @@ -68,7 +89,7 @@ struct virtio_transport { struct vsock_transport transport; =20 /* Takes ownership of the packet */ - int (*send_pkt)(struct virtio_vsock_pkt *pkt); + int (*send_pkt)(struct sk_buff *skb); }; =20 ssize_t @@ -149,11 +170,10 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk); =20 void virtio_transport_recv_pkt(struct virtio_transport *t, - struct virtio_vsock_pkt *pkt); -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt); -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct vir= tio_vsock_pkt *pkt); + struct sk_buff *skb); +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_= buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit= ); -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt); - +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb); +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue); #endif /* _LINUX_VIRTIO_VSOCK_H */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index f04abf662ec6..e348b2d09eac 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -748,6 +748,7 @@ static struct sock *__vsock_create(struct net *net, vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); =20 + sk->sk_allocation =3D GFP_KERNEL; sk->sk_destruct =3D vsock_sk_destruct; sk->sk_backlog_rcv =3D vsock_queue_rcv_skb; sock_reset_flag(sk, SOCK_DONE); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transp= ort.c index ad64f403536a..3bb293fd8607 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -21,6 +21,12 @@ #include #include =20 +#define VIRTIO_VSOCK_MAX_RX_HDR_PAYLOAD_SIZE \ + (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE \ + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) \ + - sizeof(struct virtio_vsock_hdr) \ + - sizeof(struct virtio_vsock_metadata)) + static struct workqueue_struct *virtio_vsock_workqueue; static struct virtio_vsock __rcu *the_virtio_vsock; static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock = */ @@ -42,8 +48,7 @@ struct virtio_vsock { bool tx_run; =20 struct work_struct send_pkt_work; - spinlock_t send_pkt_list_lock; - struct list_head send_pkt_list; + struct sk_buff_head send_pkt_queue; =20 atomic_t queued_replies; =20 @@ -101,41 +106,32 @@ virtio_transport_send_pkt_work(struct work_struct *wo= rk) vq =3D vsock->vqs[VSOCK_VQ_TX]; =20 for (;;) { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; struct scatterlist hdr, buf, *sgs[2]; int ret, in_sg =3D 0, out_sg =3D 0; bool reply; =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - if (list_empty(&vsock->send_pkt_list)) { - spin_unlock_bh(&vsock->send_pkt_list_lock); - break; - } + skb =3D skb_dequeue(&vsock->send_pkt_queue); =20 - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - - virtio_transport_deliver_tap_pkt(pkt); + if (!skb) + break; =20 - reply =3D pkt->reply; + virtio_transport_deliver_tap_pkt(skb); + reply =3D vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY; =20 - sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); + sg_init_one(&hdr, vsock_hdr(skb), sizeof(*vsock_hdr(skb))); sgs[out_sg++] =3D &hdr; - if (pkt->buf) { - sg_init_one(&buf, pkt->buf, pkt->len); + if (skb->len > 0) { + sg_init_one(&buf, skb->data, skb->len); sgs[out_sg++] =3D &buf; } =20 - ret =3D virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL); + ret =3D virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, GFP_KERNEL); /* Usually this means that there is no more space available in * the vq */ if (ret < 0) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb_queue_head(&vsock->send_pkt_queue, skb); break; } =20 @@ -163,33 +159,84 @@ virtio_transport_send_pkt_work(struct work_struct *wo= rk) queue_work(virtio_vsock_workqueue, &vsock->rx_work); } =20 +static inline bool +virtio_transport_skbs_can_merge(struct sk_buff *old, struct sk_buff *new) +{ + return (new->len < GOOD_COPY_LEN && + skb_tailroom(old) >=3D new->len && + vsock_hdr(new)->src_cid =3D=3D vsock_hdr(old)->src_cid && + vsock_hdr(new)->dst_cid =3D=3D vsock_hdr(old)->dst_cid && + vsock_hdr(new)->src_port =3D=3D vsock_hdr(old)->src_port && + vsock_hdr(new)->dst_port =3D=3D vsock_hdr(old)->dst_port && + vsock_hdr(new)->type =3D=3D vsock_hdr(old)->type && + vsock_hdr(new)->flags =3D=3D vsock_hdr(old)->flags && + vsock_hdr(old)->op =3D=3D VIRTIO_VSOCK_OP_RW && + vsock_hdr(new)->op =3D=3D VIRTIO_VSOCK_OP_RW); +} + +/** + * Merge the two most recent skbs together if possible. + * + * Caller must hold the queue lock. + */ +static void +virtio_transport_add_to_queue(struct sk_buff_head *queue, struct sk_buff *= new) +{ + struct sk_buff *old; + + spin_lock_bh(&queue->lock); + /* In order to reduce skb memory overhead, we merge new packets with + * older packets if they pass virtio_transport_skbs_can_merge(). + */ + if (skb_queue_empty_lockless(queue)) { + __skb_queue_tail(queue, new); + goto out; + } + + old =3D skb_peek_tail(queue); + + if (!virtio_transport_skbs_can_merge(old, new)) { + __skb_queue_tail(queue, new); + goto out; + } + + memcpy(skb_put(old, new->len), new->data, new->len); + vsock_hdr(old)->len =3D cpu_to_le32(old->len); + vsock_hdr(old)->buf_alloc =3D vsock_hdr(new)->buf_alloc; + vsock_hdr(old)->fwd_cnt =3D vsock_hdr(new)->fwd_cnt; + dev_kfree_skb_any(new); + +out: + spin_unlock_bh(&queue->lock); +} + static int -virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt) +virtio_transport_send_pkt(struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; - int len =3D pkt->len; + int len =3D skb->len; + + hdr =3D vsock_hdr(skb); =20 rcu_read_lock(); vsock =3D rcu_dereference(the_virtio_vsock); if (!vsock) { - virtio_transport_free_pkt(pkt); + kfree_skb(skb); len =3D -ENODEV; goto out_rcu; } =20 - if (le64_to_cpu(pkt->hdr.dst_cid) =3D=3D vsock->guest_cid) { - virtio_transport_free_pkt(pkt); + if (le64_to_cpu(hdr->dst_cid) =3D=3D vsock->guest_cid) { + kfree_skb(skb); len =3D -ENODEV; goto out_rcu; } =20 - if (pkt->reply) + if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY) atomic_inc(&vsock->queued_replies); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add_tail(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_transport_add_to_queue(&vsock->send_pkt_queue, skb); queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); =20 out_rcu: @@ -201,9 +248,7 @@ static int virtio_transport_cancel_pkt(struct vsock_sock *vsk) { struct virtio_vsock *vsock; - struct virtio_vsock_pkt *pkt, *n; int cnt =3D 0, ret; - LIST_HEAD(freeme); =20 rcu_read_lock(); vsock =3D rcu_dereference(the_virtio_vsock); @@ -212,20 +257,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk) goto out_rcu; } =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) { - if (pkt->vsk !=3D vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - if (pkt->reply) - cnt++; - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + cnt =3D virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue); =20 if (cnt) { struct virtqueue *rx_vq =3D vsock->vqs[VSOCK_VQ_RX]; @@ -246,38 +278,34 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk) =20 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) { - int buf_len =3D VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE; - struct virtio_vsock_pkt *pkt; - struct scatterlist hdr, buf, *sgs[2]; + struct scatterlist pkt, *sgs[1]; struct virtqueue *vq; int ret; =20 vq =3D vsock->vqs[VSOCK_VQ_RX]; =20 do { - pkt =3D kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) - break; + struct sk_buff *skb; + const size_t len =3D VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); =20 - pkt->buf =3D kmalloc(buf_len, GFP_KERNEL); - if (!pkt->buf) { - virtio_transport_free_pkt(pkt); + skb =3D alloc_skb(len, GFP_KERNEL); + if (!skb) break; - } =20 - pkt->buf_len =3D buf_len; - pkt->len =3D buf_len; + memset(skb->head, 0, + sizeof(struct virtio_vsock_metadata) + sizeof(struct virtio_vsock= _hdr)); =20 - sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); - sgs[0] =3D &hdr; + sg_init_one(&pkt, skb->head + sizeof(struct virtio_vsock_metadata), + VIRTIO_VSOCK_MAX_RX_HDR_PAYLOAD_SIZE); + sgs[0] =3D &pkt; =20 - sg_init_one(&buf, pkt->buf, buf_len); - sgs[1] =3D &buf; - ret =3D virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL); - if (ret) { - virtio_transport_free_pkt(pkt); + ret =3D virtqueue_add_sgs(vq, sgs, 0, 1, skb, GFP_KERNEL); + if (ret < 0) { + kfree_skb(skb); break; } + vsock->rx_buf_nr++; } while (vq->num_free); if (vsock->rx_buf_nr > vsock->rx_buf_max_nr) @@ -299,12 +327,12 @@ static void virtio_transport_tx_work(struct work_stru= ct *work) goto out; =20 do { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; unsigned int len; =20 virtqueue_disable_cb(vq); - while ((pkt =3D virtqueue_get_buf(vq, &len)) !=3D NULL) { - virtio_transport_free_pkt(pkt); + while ((skb =3D virtqueue_get_buf(vq, &len)) !=3D NULL) { + consume_skb(skb); added =3D true; } } while (!virtqueue_enable_cb(vq)); @@ -529,7 +557,8 @@ static void virtio_transport_rx_work(struct work_struct= *work) do { virtqueue_disable_cb(vq); for (;;) { - struct virtio_vsock_pkt *pkt; + struct virtio_vsock_hdr *hdr; + struct sk_buff *skb; unsigned int len; =20 if (!virtio_transport_more_replies(vsock)) { @@ -540,23 +569,24 @@ static void virtio_transport_rx_work(struct work_stru= ct *work) goto out; } =20 - pkt =3D virtqueue_get_buf(vq, &len); - if (!pkt) { + skb =3D virtqueue_get_buf(vq, &len); + if (!skb) break; - } =20 vsock->rx_buf_nr--; =20 /* Drop short/long packets */ - if (unlikely(len < sizeof(pkt->hdr) || - len > sizeof(pkt->hdr) + pkt->len)) { - virtio_transport_free_pkt(pkt); + if (unlikely(len < sizeof(*hdr) || + len > VIRTIO_VSOCK_MAX_RX_HDR_PAYLOAD_SIZE)) { + kfree_skb(skb); continue; } =20 - pkt->len =3D len - sizeof(pkt->hdr); - virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&virtio_transport, pkt); + hdr =3D vsock_hdr(skb); + virtio_vsock_skb_reserve(skb); + virtio_vsock_skb_rx_put(skb); + virtio_transport_deliver_tap_pkt(skb); + virtio_transport_recv_pkt(&virtio_transport, skb); } } while (!virtqueue_enable_cb(vq)); =20 @@ -610,7 +640,7 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *v= sock) static void virtio_vsock_vqs_del(struct virtio_vsock *vsock) { struct virtio_device *vdev =3D vsock->vdev; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; =20 /* Reset all connected sockets when the VQs disappear */ vsock_for_each_connected_socket(&virtio_transport.transport, @@ -637,23 +667,16 @@ static void virtio_vsock_vqs_del(struct virtio_vsock = *vsock) virtio_reset_device(vdev); =20 mutex_lock(&vsock->rx_lock); - while ((pkt =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) - virtio_transport_free_pkt(pkt); + while ((skb =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) + kfree_skb(skb); mutex_unlock(&vsock->rx_lock); =20 mutex_lock(&vsock->tx_lock); - while ((pkt =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX]))) - virtio_transport_free_pkt(pkt); + while ((skb =3D virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX]))) + kfree_skb(skb); mutex_unlock(&vsock->tx_lock); =20 - spin_lock_bh(&vsock->send_pkt_list_lock); - while (!list_empty(&vsock->send_pkt_list)) { - pkt =3D list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb_queue_purge(&vsock->send_pkt_queue); =20 /* Delete virtqueues and flush outstanding callbacks if any */ vdev->config->del_vqs(vdev); @@ -690,8 +713,7 @@ static int virtio_vsock_probe(struct virtio_device *vde= v) mutex_init(&vsock->tx_lock); mutex_init(&vsock->rx_lock); mutex_init(&vsock->event_lock); - spin_lock_init(&vsock->send_pkt_list_lock); - INIT_LIST_HEAD(&vsock->send_pkt_list); + skb_queue_head_init(&vsock->send_pkt_queue); INIT_WORK(&vsock->rx_work, virtio_transport_rx_work); INIT_WORK(&vsock->tx_work, virtio_transport_tx_work); INIT_WORK(&vsock->event_work, virtio_transport_event_work); diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio= _transport_common.c index ec2c2afbf0d0..920578597bb9 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -37,53 +37,81 @@ virtio_transport_get_ops(struct vsock_sock *vsk) return container_of(t, struct virtio_transport, transport); } =20 -static struct virtio_vsock_pkt * -virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, +/* Returns a new packet on success, otherwise returns NULL. + * + * If NULL is returned, errp is set to a negative errno. + */ +static struct sk_buff * +virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, size_t len, u32 src_cid, u32 src_port, u32 dst_cid, - u32 dst_port) + u32 dst_port, + int *errp) { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; + struct virtio_vsock_hdr *hdr; + void *payload; + const size_t skb_len =3D sizeof(*hdr) + sizeof(struct virtio_vsock_metada= ta) + len; int err; =20 - pkt =3D kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) - return NULL; + if (info->vsk) { + unsigned int msg_flags =3D info->msg ? info->msg->msg_flags : 0; + struct sock *sk; =20 - pkt->hdr.type =3D cpu_to_le16(info->type); - pkt->hdr.op =3D cpu_to_le16(info->op); - pkt->hdr.src_cid =3D cpu_to_le64(src_cid); - pkt->hdr.dst_cid =3D cpu_to_le64(dst_cid); - pkt->hdr.src_port =3D cpu_to_le32(src_port); - pkt->hdr.dst_port =3D cpu_to_le32(dst_port); - pkt->hdr.flags =3D cpu_to_le32(info->flags); - pkt->len =3D len; - pkt->hdr.len =3D cpu_to_le32(len); - pkt->reply =3D info->reply; - pkt->vsk =3D info->vsk; + sk =3D sk_vsock(info->vsk); + skb =3D sock_alloc_send_skb(sk, skb_len, + msg_flags & MSG_DONTWAIT, errp); =20 - if (info->msg && len > 0) { - pkt->buf =3D kmalloc(len, GFP_KERNEL); - if (!pkt->buf) - goto out_pkt; + if (skb) + skb->priority =3D sk->sk_priority; + } else { + skb =3D alloc_skb(skb_len, GFP_KERNEL); + } + + if (!skb) { + /* If using alloc_skb(), the skb is NULL due to lacking memory. + * Otherwise, errp is set by sock_alloc_send_skb(). + */ + if (!info->vsk) + *errp =3D -ENOMEM; + return NULL; + } =20 - pkt->buf_len =3D len; + memset(skb->head, 0, sizeof(*hdr) + sizeof(struct virtio_vsock_metadata)); + virtio_vsock_skb_reserve(skb); + payload =3D skb_put(skb, len); =20 - err =3D memcpy_from_msg(pkt->buf, info->msg, len); - if (err) + hdr =3D vsock_hdr(skb); + hdr->type =3D cpu_to_le16(info->type); + hdr->op =3D cpu_to_le16(info->op); + hdr->src_cid =3D cpu_to_le64(src_cid); + hdr->dst_cid =3D cpu_to_le64(dst_cid); + hdr->src_port =3D cpu_to_le32(src_port); + hdr->dst_port =3D cpu_to_le32(dst_port); + hdr->flags =3D cpu_to_le32(info->flags); + hdr->len =3D cpu_to_le32(len); + + if (info->msg && len > 0) { + err =3D memcpy_from_msg(payload, info->msg, len); + if (err) { + *errp =3D -ENOMEM; goto out; + } =20 if (msg_data_left(info->msg) =3D=3D 0 && info->type =3D=3D VIRTIO_VSOCK_TYPE_SEQPACKET) { - pkt->hdr.flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + hdr->flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); =20 if (info->msg->msg_flags & MSG_EOR) - pkt->hdr.flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + hdr->flags |=3D cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); } } =20 + if (info->reply) + vsock_metadata(skb)->flags |=3D VIRTIO_VSOCK_METADATA_FLAGS_REPLY; + trace_virtio_transport_alloc_pkt(src_cid, src_port, dst_cid, dst_port, len, @@ -91,85 +119,26 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_inf= o *info, info->op, info->flags); =20 - return pkt; + return skb; =20 out: - kfree(pkt->buf); -out_pkt: - kfree(pkt); + kfree_skb(skb); return NULL; } =20 /* Packet capture */ static struct sk_buff *virtio_transport_build_skb(void *opaque) { - struct virtio_vsock_pkt *pkt =3D opaque; - struct af_vsockmon_hdr *hdr; - struct sk_buff *skb; - size_t payload_len; - void *payload_buf; - - /* A packet could be split to fit the RX buffer, so we can retrieve - * the payload length from the header and the buffer pointer taking - * care of the offset in the original packet. - */ - payload_len =3D le32_to_cpu(pkt->hdr.len); - payload_buf =3D pkt->buf + pkt->off; - - skb =3D alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len, - GFP_ATOMIC); - if (!skb) - return NULL; - - hdr =3D skb_put(skb, sizeof(*hdr)); - - /* pkt->hdr is little-endian so no need to byteswap here */ - hdr->src_cid =3D pkt->hdr.src_cid; - hdr->src_port =3D pkt->hdr.src_port; - hdr->dst_cid =3D pkt->hdr.dst_cid; - hdr->dst_port =3D pkt->hdr.dst_port; - - hdr->transport =3D cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO); - hdr->len =3D cpu_to_le16(sizeof(pkt->hdr)); - memset(hdr->reserved, 0, sizeof(hdr->reserved)); - - switch (le16_to_cpu(pkt->hdr.op)) { - case VIRTIO_VSOCK_OP_REQUEST: - case VIRTIO_VSOCK_OP_RESPONSE: - hdr->op =3D cpu_to_le16(AF_VSOCK_OP_CONNECT); - break; - case VIRTIO_VSOCK_OP_RST: - case VIRTIO_VSOCK_OP_SHUTDOWN: - hdr->op =3D cpu_to_le16(AF_VSOCK_OP_DISCONNECT); - break; - case VIRTIO_VSOCK_OP_RW: - hdr->op =3D cpu_to_le16(AF_VSOCK_OP_PAYLOAD); - break; - case VIRTIO_VSOCK_OP_CREDIT_UPDATE: - case VIRTIO_VSOCK_OP_CREDIT_REQUEST: - hdr->op =3D cpu_to_le16(AF_VSOCK_OP_CONTROL); - break; - default: - hdr->op =3D cpu_to_le16(AF_VSOCK_OP_UNKNOWN); - break; - } - - skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr)); - - if (payload_len) { - skb_put_data(skb, payload_buf, payload_len); - } - - return skb; + return (struct sk_buff *)opaque; } =20 -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt) +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb) { - if (pkt->tap_delivered) + if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERE= D) return; =20 - vsock_deliver_tap(virtio_transport_build_skb, pkt); - pkt->tap_delivered =3D true; + vsock_deliver_tap(virtio_transport_build_skb, skb); + vsock_metadata(skb)->flags |=3D VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED; } EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt); =20 @@ -192,8 +161,9 @@ static int virtio_transport_send_pkt_info(struct vsock_= sock *vsk, u32 src_cid, src_port, dst_cid, dst_port; const struct virtio_transport *t_ops; struct virtio_vsock_sock *vvs; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; u32 pkt_len =3D info->pkt_len; + int err; =20 info->type =3D virtio_transport_get_type(sk_vsock(vsk)); =20 @@ -224,42 +194,47 @@ static int virtio_transport_send_pkt_info(struct vsoc= k_sock *vsk, if (pkt_len =3D=3D 0 && info->op =3D=3D VIRTIO_VSOCK_OP_RW) return pkt_len; =20 - pkt =3D virtio_transport_alloc_pkt(info, pkt_len, + skb =3D virtio_transport_alloc_skb(info, pkt_len, src_cid, src_port, - dst_cid, dst_port); - if (!pkt) { + dst_cid, dst_port, + &err); + if (!skb) { virtio_transport_put_credit(vvs, pkt_len); - return -ENOMEM; + return err; } =20 - virtio_transport_inc_tx_pkt(vvs, pkt); + virtio_transport_inc_tx_pkt(vvs, skb); + + err =3D t_ops->send_pkt(skb); =20 - return t_ops->send_pkt(pkt); + return err < 0 ? -ENOMEM : err; } =20 static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - if (vvs->rx_bytes + pkt->len > vvs->buf_alloc) + if (vvs->rx_bytes + skb->len > vvs->buf_alloc) return false; =20 - vvs->rx_bytes +=3D pkt->len; + vvs->rx_bytes +=3D skb->len; return true; } =20 static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - vvs->rx_bytes -=3D pkt->len; - vvs->fwd_cnt +=3D pkt->len; + vvs->rx_bytes -=3D skb->len; + vvs->fwd_cnt +=3D skb->len; } =20 -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct vir= tio_vsock_pkt *pkt) +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_= buff *skb) { + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); + spin_lock_bh(&vvs->rx_lock); vvs->last_fwd_cnt =3D vvs->fwd_cnt; - pkt->hdr.fwd_cnt =3D cpu_to_le32(vvs->fwd_cnt); - pkt->hdr.buf_alloc =3D cpu_to_le32(vvs->buf_alloc); + hdr->fwd_cnt =3D cpu_to_le32(vvs->fwd_cnt); + hdr->buf_alloc =3D cpu_to_le32(vvs->buf_alloc); spin_unlock_bh(&vvs->rx_lock); } EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt); @@ -303,29 +278,29 @@ virtio_transport_stream_do_peek(struct vsock_sock *vs= k, size_t len) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb, *tmp; size_t bytes, total =3D 0, off; int err =3D -EFAULT; =20 spin_lock_bh(&vvs->rx_lock); =20 - list_for_each_entry(pkt, &vvs->rx_queue, list) { - off =3D pkt->off; + skb_queue_walk_safe(&vvs->rx_queue, skb, tmp) { + off =3D vsock_metadata(skb)->off; =20 if (total =3D=3D len) break; =20 - while (total < len && off < pkt->len) { + while (total < len && off < skb->len) { bytes =3D len - total; - if (bytes > pkt->len - off) - bytes =3D pkt->len - off; + if (bytes > skb->len - off) + bytes =3D skb->len - off; =20 /* sk_lock is held by caller so no one else can dequeue. * Unlock rx_lock since memcpy_to_msg() may sleep. */ spin_unlock_bh(&vvs->rx_lock); =20 - err =3D memcpy_to_msg(msg, pkt->buf + off, bytes); + err =3D memcpy_to_msg(msg, skb->data + off, bytes); if (err) goto out; =20 @@ -352,37 +327,40 @@ virtio_transport_stream_do_dequeue(struct vsock_sock = *vsk, size_t len) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; size_t bytes, total =3D 0; u32 free_space; int err =3D -EFAULT; =20 spin_lock_bh(&vvs->rx_lock); - while (total < len && !list_empty(&vvs->rx_queue)) { - pkt =3D list_first_entry(&vvs->rx_queue, - struct virtio_vsock_pkt, list); + while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) { + skb =3D __skb_dequeue(&vvs->rx_queue); =20 bytes =3D len - total; - if (bytes > pkt->len - pkt->off) - bytes =3D pkt->len - pkt->off; + if (bytes > skb->len - vsock_metadata(skb)->off) + bytes =3D skb->len - vsock_metadata(skb)->off; =20 /* sk_lock is held by caller so no one else can dequeue. * Unlock rx_lock since memcpy_to_msg() may sleep. */ spin_unlock_bh(&vvs->rx_lock); =20 - err =3D memcpy_to_msg(msg, pkt->buf + pkt->off, bytes); + err =3D memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, bytes); if (err) goto out; =20 spin_lock_bh(&vvs->rx_lock); =20 total +=3D bytes; - pkt->off +=3D bytes; - if (pkt->off =3D=3D pkt->len) { - virtio_transport_dec_rx_pkt(vvs, pkt); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); + vsock_metadata(skb)->off +=3D bytes; + + WARN_ON(vsock_metadata(skb)->off > skb->len); + + if (vsock_metadata(skb)->off =3D=3D skb->len) { + virtio_transport_dec_rx_pkt(vvs, skb); + consume_skb(skb); + } else { + __skb_queue_head(&vvs->rx_queue, skb); } } =20 @@ -414,7 +392,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct= vsock_sock *vsk, int flags) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; int dequeued_len =3D 0; size_t user_buf_len =3D msg_data_left(msg); bool msg_ready =3D false; @@ -427,13 +405,16 @@ static int virtio_transport_seqpacket_do_dequeue(stru= ct vsock_sock *vsk, } =20 while (!msg_ready) { - pkt =3D list_first_entry(&vvs->rx_queue, struct virtio_vsock_pkt, list); + struct virtio_vsock_hdr *hdr; + + skb =3D __skb_dequeue(&vvs->rx_queue); + hdr =3D vsock_hdr(skb); =20 if (dequeued_len >=3D 0) { size_t pkt_len; size_t bytes_to_copy; =20 - pkt_len =3D (size_t)le32_to_cpu(pkt->hdr.len); + pkt_len =3D (size_t)le32_to_cpu(hdr->len); bytes_to_copy =3D min(user_buf_len, pkt_len); =20 if (bytes_to_copy) { @@ -444,7 +425,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct= vsock_sock *vsk, */ spin_unlock_bh(&vvs->rx_lock); =20 - err =3D memcpy_to_msg(msg, pkt->buf, bytes_to_copy); + err =3D memcpy_to_msg(msg, skb->data, bytes_to_copy); if (err) { /* Copy of message failed. Rest of * fragments will be freed without copy. @@ -461,17 +442,16 @@ static int virtio_transport_seqpacket_do_dequeue(stru= ct vsock_sock *vsk, dequeued_len +=3D pkt_len; } =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) { msg_ready =3D true; vvs->msg_count--; =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) msg->msg_flags |=3D MSG_EOR; } =20 - virtio_transport_dec_rx_pkt(vvs, pkt); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); + virtio_transport_dec_rx_pkt(vvs, skb); + kfree_skb(skb); } =20 spin_unlock_bh(&vvs->rx_lock); @@ -609,7 +589,7 @@ int virtio_transport_do_socket_init(struct vsock_sock *= vsk, =20 spin_lock_init(&vvs->rx_lock); spin_lock_init(&vvs->tx_lock); - INIT_LIST_HEAD(&vvs->rx_queue); + skb_queue_head_init(&vvs->rx_queue); =20 return 0; } @@ -809,16 +789,16 @@ void virtio_transport_destruct(struct vsock_sock *vsk) EXPORT_SYMBOL_GPL(virtio_transport_destruct); =20 static int virtio_transport_reset(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct virtio_vsock_pkt_info info =3D { .op =3D VIRTIO_VSOCK_OP_RST, - .reply =3D !!pkt, + .reply =3D !!skb, .vsk =3D vsk, }; =20 /* Send RST only if the original pkt is not a RST pkt */ - if (pkt && le16_to_cpu(pkt->hdr.op) =3D=3D VIRTIO_VSOCK_OP_RST) + if (skb && le16_to_cpu(vsock_hdr(skb)->op) =3D=3D VIRTIO_VSOCK_OP_RST) return 0; =20 return virtio_transport_send_pkt_info(vsk, &info); @@ -828,29 +808,32 @@ static int virtio_transport_reset(struct vsock_sock *= vsk, * attempt was made to connect to a socket that does not exist. */ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - struct virtio_vsock_pkt *reply; + struct sk_buff *reply; + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); struct virtio_vsock_pkt_info info =3D { .op =3D VIRTIO_VSOCK_OP_RST, - .type =3D le16_to_cpu(pkt->hdr.type), + .type =3D le16_to_cpu(hdr->type), .reply =3D true, }; + int err; =20 /* Send RST only if the original pkt is not a RST pkt */ - if (le16_to_cpu(pkt->hdr.op) =3D=3D VIRTIO_VSOCK_OP_RST) + if (le16_to_cpu(hdr->op) =3D=3D VIRTIO_VSOCK_OP_RST) return 0; =20 - reply =3D virtio_transport_alloc_pkt(&info, 0, - le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port), - le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); + reply =3D virtio_transport_alloc_skb(&info, 0, + le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port), + le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port), + &err); if (!reply) - return -ENOMEM; + return err; =20 if (!t) { - virtio_transport_free_pkt(reply); + kfree_skb(reply); return -ENOTCONN; } =20 @@ -861,16 +844,11 @@ static int virtio_transport_reset_no_sock(const struc= t virtio_transport *t, static void virtio_transport_remove_sock(struct vsock_sock *vsk) { struct virtio_vsock_sock *vvs =3D vsk->trans; - struct virtio_vsock_pkt *pkt, *tmp; =20 /* We don't need to take rx_lock, as the socket is closing and we are * removing it. */ - list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) { - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } - + __skb_queue_purge(&vvs->rx_queue); vsock_remove_sock(vsk); } =20 @@ -984,13 +962,14 @@ EXPORT_SYMBOL_GPL(virtio_transport_release); =20 static int virtio_transport_recv_connecting(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct vsock_sock *vsk =3D vsock_sk(sk); + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); int err; int skerr; =20 - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RESPONSE: sk->sk_state =3D TCP_ESTABLISHED; sk->sk_socket->state =3D SS_CONNECTED; @@ -1011,7 +990,7 @@ virtio_transport_recv_connecting(struct sock *sk, return 0; =20 destroy: - virtio_transport_reset(vsk, pkt); + virtio_transport_reset(vsk, skb); sk->sk_state =3D TCP_CLOSE; sk->sk_err =3D skerr; sk_error_report(sk); @@ -1020,34 +999,38 @@ virtio_transport_recv_connecting(struct sock *sk, =20 static void virtio_transport_recv_enqueue(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct virtio_vsock_sock *vvs =3D vsk->trans; + struct virtio_vsock_hdr *hdr; bool can_enqueue, free_pkt =3D false; + u32 len; =20 - pkt->len =3D le32_to_cpu(pkt->hdr.len); - pkt->off =3D 0; + hdr =3D vsock_hdr(skb); + len =3D le32_to_cpu(hdr->len); + vsock_metadata(skb)->off =3D 0; =20 spin_lock_bh(&vvs->rx_lock); =20 - can_enqueue =3D virtio_transport_inc_rx_pkt(vvs, pkt); + can_enqueue =3D virtio_transport_inc_rx_pkt(vvs, skb); if (!can_enqueue) { free_pkt =3D true; goto out; } =20 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) vvs->msg_count++; =20 /* Try to copy small packets into the buffer of last packet queued, * to avoid wasting memory queueing the entire buffer with a small * payload. */ - if (pkt->len <=3D GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) { - struct virtio_vsock_pkt *last_pkt; + if (len <=3D GOOD_COPY_LEN && !skb_queue_empty_lockless(&vvs->rx_queue)) { + struct virtio_vsock_hdr *last_hdr; + struct sk_buff *last_skb; =20 - last_pkt =3D list_last_entry(&vvs->rx_queue, - struct virtio_vsock_pkt, list); + last_skb =3D skb_peek_tail(&vvs->rx_queue); + last_hdr =3D vsock_hdr(last_skb); =20 /* If there is space in the last packet queued, we copy the * new packet in its buffer. We avoid this if the last packet @@ -1055,35 +1038,35 @@ virtio_transport_recv_enqueue(struct vsock_sock *vs= k, * delimiter of SEQPACKET message, so 'pkt' is the first packet * of a new message. */ - if ((pkt->len <=3D last_pkt->buf_len - last_pkt->len) && - !(le32_to_cpu(last_pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM)) { - memcpy(last_pkt->buf + last_pkt->len, pkt->buf, - pkt->len); - last_pkt->len +=3D pkt->len; + if (skb->len < skb_tailroom(last_skb) && + !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) && + (vsock_hdr(skb)->type !=3D VIRTIO_VSOCK_TYPE_DGRAM)) { + memcpy(skb_put(last_skb, skb->len), skb->data, skb->len); free_pkt =3D true; - last_pkt->hdr.flags |=3D pkt->hdr.flags; + last_hdr->flags |=3D hdr->flags; goto out; } } =20 - list_add_tail(&pkt->list, &vvs->rx_queue); + __skb_queue_tail(&vvs->rx_queue, skb); =20 out: spin_unlock_bh(&vvs->rx_lock); if (free_pkt) - virtio_transport_free_pkt(pkt); + kfree_skb(skb); } =20 static int virtio_transport_recv_connected(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct vsock_sock *vsk =3D vsock_sk(sk); + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); int err =3D 0; =20 - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RW: - virtio_transport_recv_enqueue(vsk, pkt); + virtio_transport_recv_enqueue(vsk, skb); sk->sk_data_ready(sk); return err; case VIRTIO_VSOCK_OP_CREDIT_REQUEST: @@ -1093,18 +1076,17 @@ virtio_transport_recv_connected(struct sock *sk, sk->sk_write_space(sk); break; case VIRTIO_VSOCK_OP_SHUTDOWN: - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_RCV) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_RCV) vsk->peer_shutdown |=3D RCV_SHUTDOWN; - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) vsk->peer_shutdown |=3D SEND_SHUTDOWN; if (vsk->peer_shutdown =3D=3D SHUTDOWN_MASK && vsock_stream_has_data(vsk) <=3D 0 && !sock_flag(sk, SOCK_DONE)) { (void)virtio_transport_reset(vsk, NULL); - virtio_transport_do_close(vsk, true); } - if (le32_to_cpu(pkt->hdr.flags)) + if (le32_to_cpu(vsock_hdr(skb)->flags)) sk->sk_state_change(sk); break; case VIRTIO_VSOCK_OP_RST: @@ -1115,28 +1097,30 @@ virtio_transport_recv_connected(struct sock *sk, break; } =20 - virtio_transport_free_pkt(pkt); + kfree_skb(skb); return err; } =20 static void virtio_transport_recv_disconnecting(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct vsock_sock *vsk =3D vsock_sk(sk); + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); =20 - if (le16_to_cpu(pkt->hdr.op) =3D=3D VIRTIO_VSOCK_OP_RST) + if (le16_to_cpu(hdr->op) =3D=3D VIRTIO_VSOCK_OP_RST) virtio_transport_do_close(vsk, true); } =20 static int virtio_transport_send_response(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); struct virtio_vsock_pkt_info info =3D { .op =3D VIRTIO_VSOCK_OP_RESPONSE, - .remote_cid =3D le64_to_cpu(pkt->hdr.src_cid), - .remote_port =3D le32_to_cpu(pkt->hdr.src_port), + .remote_cid =3D le64_to_cpu(hdr->src_cid), + .remote_port =3D le32_to_cpu(hdr->src_port), .reply =3D true, .vsk =3D vsk, }; @@ -1145,10 +1129,11 @@ virtio_transport_send_response(struct vsock_sock *v= sk, } =20 static bool virtio_transport_space_update(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct vsock_sock *vsk =3D vsock_sk(sk); struct virtio_vsock_sock *vvs =3D vsk->trans; + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); bool space_available; =20 /* Listener sockets are not associated with any transport, so we are @@ -1161,8 +1146,8 @@ static bool virtio_transport_space_update(struct sock= *sk, =20 /* buf_alloc and fwd_cnt is always included in the hdr */ spin_lock_bh(&vvs->tx_lock); - vvs->peer_buf_alloc =3D le32_to_cpu(pkt->hdr.buf_alloc); - vvs->peer_fwd_cnt =3D le32_to_cpu(pkt->hdr.fwd_cnt); + vvs->peer_buf_alloc =3D le32_to_cpu(hdr->buf_alloc); + vvs->peer_fwd_cnt =3D le32_to_cpu(hdr->fwd_cnt); space_available =3D virtio_transport_has_space(vsk); spin_unlock_bh(&vvs->tx_lock); return space_available; @@ -1170,27 +1155,28 @@ static bool virtio_transport_space_update(struct so= ck *sk, =20 /* Handle server socket */ static int -virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt, +virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, struct virtio_transport *t) { struct vsock_sock *vsk =3D vsock_sk(sk); + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); struct vsock_sock *vchild; struct sock *child; int ret; =20 - if (le16_to_cpu(pkt->hdr.op) !=3D VIRTIO_VSOCK_OP_REQUEST) { - virtio_transport_reset_no_sock(t, pkt); + if (le16_to_cpu(hdr->op) !=3D VIRTIO_VSOCK_OP_REQUEST) { + virtio_transport_reset_no_sock(t, skb); return -EINVAL; } =20 if (sk_acceptq_is_full(sk)) { - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); return -ENOMEM; } =20 child =3D vsock_create_connected(sk); if (!child) { - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); return -ENOMEM; } =20 @@ -1201,10 +1187,10 @@ virtio_transport_recv_listen(struct sock *sk, struc= t virtio_vsock_pkt *pkt, child->sk_state =3D TCP_ESTABLISHED; =20 vchild =3D vsock_sk(child); - vsock_addr_init(&vchild->local_addr, le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port)); - vsock_addr_init(&vchild->remote_addr, le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); + vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port)); + vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); =20 ret =3D vsock_assign_transport(vchild, vsk); /* Transport assigned (looking at remote_addr) must be the same @@ -1212,17 +1198,17 @@ virtio_transport_recv_listen(struct sock *sk, struc= t virtio_vsock_pkt *pkt, */ if (ret || vchild->transport !=3D &t->transport) { release_sock(child); - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); sock_put(child); return ret; } =20 - if (virtio_transport_space_update(child, pkt)) + if (virtio_transport_space_update(child, skb)) child->sk_write_space(child); =20 vsock_insert_connected(vchild); vsock_enqueue_accept(sk, child); - virtio_transport_send_response(vchild, pkt); + virtio_transport_send_response(vchild, skb); =20 release_sock(child); =20 @@ -1240,29 +1226,30 @@ static bool virtio_transport_valid_type(u16 type) * lock. */ void virtio_transport_recv_pkt(struct virtio_transport *t, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct sockaddr_vm src, dst; struct vsock_sock *vsk; struct sock *sk; bool space_available; + struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); =20 - vsock_addr_init(&src, le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); - vsock_addr_init(&dst, le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port)); + vsock_addr_init(&src, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); + vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port)); =20 trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port, dst.svm_cid, dst.svm_port, - le32_to_cpu(pkt->hdr.len), - le16_to_cpu(pkt->hdr.type), - le16_to_cpu(pkt->hdr.op), - le32_to_cpu(pkt->hdr.flags), - le32_to_cpu(pkt->hdr.buf_alloc), - le32_to_cpu(pkt->hdr.fwd_cnt)); - - if (!virtio_transport_valid_type(le16_to_cpu(pkt->hdr.type))) { - (void)virtio_transport_reset_no_sock(t, pkt); + le32_to_cpu(hdr->len), + le16_to_cpu(hdr->type), + le16_to_cpu(hdr->op), + le32_to_cpu(hdr->flags), + le32_to_cpu(hdr->buf_alloc), + le32_to_cpu(hdr->fwd_cnt)); + + if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) { + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } =20 @@ -1273,13 +1260,13 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, if (!sk) { sk =3D vsock_find_bound_socket(&dst); if (!sk) { - (void)virtio_transport_reset_no_sock(t, pkt); + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } } =20 - if (virtio_transport_get_type(sk) !=3D le16_to_cpu(pkt->hdr.type)) { - (void)virtio_transport_reset_no_sock(t, pkt); + if (virtio_transport_get_type(sk) !=3D le16_to_cpu(hdr->type)) { + (void)virtio_transport_reset_no_sock(t, skb); sock_put(sk); goto free_pkt; } @@ -1290,13 +1277,13 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, =20 /* Check if sk has been closed before lock_sock */ if (sock_flag(sk, SOCK_DONE)) { - (void)virtio_transport_reset_no_sock(t, pkt); + (void)virtio_transport_reset_no_sock(t, skb); release_sock(sk); sock_put(sk); goto free_pkt; } =20 - space_available =3D virtio_transport_space_update(sk, pkt); + space_available =3D virtio_transport_space_update(sk, skb); =20 /* Update CID in case it has changed after a transport reset event */ if (vsk->local_addr.svm_cid !=3D VMADDR_CID_ANY) @@ -1307,23 +1294,23 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, =20 switch (sk->sk_state) { case TCP_LISTEN: - virtio_transport_recv_listen(sk, pkt, t); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_listen(sk, skb, t); + kfree_skb(skb); break; case TCP_SYN_SENT: - virtio_transport_recv_connecting(sk, pkt); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_connecting(sk, skb); + kfree_skb(skb); break; case TCP_ESTABLISHED: - virtio_transport_recv_connected(sk, pkt); + virtio_transport_recv_connected(sk, skb); break; case TCP_CLOSING: - virtio_transport_recv_disconnecting(sk, pkt); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_disconnecting(sk, skb); + kfree_skb(skb); break; default: - (void)virtio_transport_reset_no_sock(t, pkt); - virtio_transport_free_pkt(pkt); + (void)virtio_transport_reset_no_sock(t, skb); + kfree_skb(skb); break; } =20 @@ -1336,16 +1323,42 @@ void virtio_transport_recv_pkt(struct virtio_transp= ort *t, return; =20 free_pkt: - virtio_transport_free_pkt(pkt); + kfree(skb); } EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); =20 -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) +/* Remove skbs found in a queue that have a vsk that matches. + * + * Each skb is freed. + * + * Returns the count of skbs that were reply packets. + */ +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue) { - kfree(pkt->buf); - kfree(pkt); + int cnt =3D 0; + struct sk_buff *skb, *tmp; + struct sk_buff_head freeme; + + skb_queue_head_init(&freeme); + + spin_lock_bh(&queue->lock); + skb_queue_walk_safe(queue, skb, tmp) { + if (vsock_sk(skb->sk) !=3D vsk) + continue; + + __skb_unlink(skb, queue); + skb_queue_tail(&freeme, skb); + + if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY) + cnt++; + } + spin_unlock_bh(&queue->lock); + + skb_queue_purge(&freeme); + + return cnt; } -EXPORT_SYMBOL_GPL(virtio_transport_free_pkt); +EXPORT_SYMBOL_GPL(virtio_transport_purge_skbs); =20 MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Asias He"); diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 169a8cf65b39..906f7cdff65e 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -16,7 +16,7 @@ struct vsock_loopback { struct workqueue_struct *workqueue; =20 spinlock_t pkt_list_lock; /* protects pkt_list */ - struct list_head pkt_list; + struct sk_buff_head pkt_queue; struct work_struct pkt_work; }; =20 @@ -27,13 +27,13 @@ static u32 vsock_loopback_get_local_cid(void) return VMADDR_CID_LOCAL; } =20 -static int vsock_loopback_send_pkt(struct virtio_vsock_pkt *pkt) +static int vsock_loopback_send_pkt(struct sk_buff *skb) { struct vsock_loopback *vsock =3D &the_vsock_loopback; - int len =3D pkt->len; + int len =3D skb->len; =20 spin_lock_bh(&vsock->pkt_list_lock); - list_add_tail(&pkt->list, &vsock->pkt_list); + skb_queue_tail(&vsock->pkt_queue, skb); spin_unlock_bh(&vsock->pkt_list_lock); =20 queue_work(vsock->workqueue, &vsock->pkt_work); @@ -44,21 +44,8 @@ static int vsock_loopback_send_pkt(struct virtio_vsock_p= kt *pkt) static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) { struct vsock_loopback *vsock =3D &the_vsock_loopback; - struct virtio_vsock_pkt *pkt, *n; - LIST_HEAD(freeme); =20 - spin_lock_bh(&vsock->pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->pkt_list, list) { - if (pkt->vsk !=3D vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + virtio_transport_purge_skbs(vsk, &vsock->pkt_queue); =20 return 0; } @@ -121,20 +108,20 @@ static void vsock_loopback_work(struct work_struct *w= ork) { struct vsock_loopback *vsock =3D container_of(work, struct vsock_loopback, pkt_work); - LIST_HEAD(pkts); + struct sk_buff_head pkts; + + skb_queue_head_init(&pkts); =20 spin_lock_bh(&vsock->pkt_list_lock); - list_splice_init(&vsock->pkt_list, &pkts); + skb_queue_splice_init(&vsock->pkt_queue, &pkts); spin_unlock_bh(&vsock->pkt_list_lock); =20 - while (!list_empty(&pkts)) { - struct virtio_vsock_pkt *pkt; + while (!skb_queue_empty(&pkts)) { + struct sk_buff *skb; =20 - pkt =3D list_first_entry(&pkts, struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - - virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&loopback_transport, pkt); + skb =3D skb_dequeue(&pkts); + virtio_transport_deliver_tap_pkt(skb); + virtio_transport_recv_pkt(&loopback_transport, skb); } } =20 @@ -148,7 +135,7 @@ static int __init vsock_loopback_init(void) return -ENOMEM; =20 spin_lock_init(&vsock->pkt_list_lock); - INIT_LIST_HEAD(&vsock->pkt_list); + skb_queue_head_init(&vsock->pkt_queue); INIT_WORK(&vsock->pkt_work, vsock_loopback_work); =20 ret =3D vsock_core_register(&loopback_transport.transport, @@ -166,19 +153,13 @@ static int __init vsock_loopback_init(void) static void __exit vsock_loopback_exit(void) { struct vsock_loopback *vsock =3D &the_vsock_loopback; - struct virtio_vsock_pkt *pkt; =20 vsock_core_unregister(&loopback_transport.transport); =20 flush_work(&vsock->pkt_work); =20 spin_lock_bh(&vsock->pkt_list_lock); - while (!list_empty(&vsock->pkt_list)) { - pkt =3D list_first_entry(&vsock->pkt_list, - struct virtio_vsock_pkt, list); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + skb_queue_purge(&vsock->pkt_queue); spin_unlock_bh(&vsock->pkt_list_lock); =20 destroy_workqueue(vsock->workqueue); --=20 2.35.1 From nobody Sat Apr 11 05:14:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86101C28B2B for ; Mon, 15 Aug 2022 17:56:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232084AbiHOR4q (ORCPT ); Mon, 15 Aug 2022 13:56:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231974AbiHOR4i (ORCPT ); Mon, 15 Aug 2022 13:56:38 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E682928739; Mon, 15 Aug 2022 10:56:37 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id v4so2759247pgi.10; Mon, 15 Aug 2022 10:56:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc; bh=ZlaiYwQmPOVoDuRV3NsPQqCjaBRVIY66ahwXf+LQSm8=; b=qX55hIBVbqBR0cZjAsEPn7sgPjIfKowbxBQYk3CrGCVeEG8h20RcAm8pP3RfiMqhTp 4qQYFuJx43GZpeYhkVkKvpXTUIgy46PgYwoeLr2z5Jxd3GCDpfDBcKxq+UfJgvdk7ANL Ct5w6QtEE9SN1FY9puokVo3RQKpQesk9JbM2fRzZ75C9UuYzJzNG+dow1Cnox/cgcRwb shIxcXa69QoZqV+X+tXtL2EIBqopqS4Gqqvw2nuNTkAMXUVycGlY4LtciV3T2m2t+8/8 Y7o9dZemlo/FOcbUznf4y/NJkFL2GpD6jfW9rp/DweDsWggEyVJG1I5Iwx6Q1qFthkA8 MESw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc; bh=ZlaiYwQmPOVoDuRV3NsPQqCjaBRVIY66ahwXf+LQSm8=; b=2sKIh6eQfM4CL9wzuNsvP1czkKNTUx7ZvGLkja5dHbJn05cr8I6L81V4cVaPlgshsY o2fk3u/tcNTQZrtXWw18T8dXt51371a4SHQlr3TwPrhXz00vtsk0OyBuaM4WeaHkmRc3 froovYT1g//YWFoLsLCW9+ByahGGqT0v4d1TcQ9EKfGSSvl+BSbEijnlQpblp6xuvdZl Kr3UmrYhlTs6iAFSJeIFfLzFrpX50i7e3pgw6nb/dYOEGkspPrwHc7k/n/8PE+vKIkMK fLp/P+OHuFSCXKBMgMvIQEcKeRgFdajaqmfG7Lau3SXDCmysf0upv0QzEFcehR603ECm wfoA== X-Gm-Message-State: ACgBeo3vaA2ScG3OFxhkUUdua2fUCy+GgD3sMpz9/x8ZdpUgBvKpxviE rb2TQDF5XLJk9JfYWOr+XX0= X-Google-Smtp-Source: AA6agR7CPHwtNojFP6EFwNoAyZ00k+Ph/v48Aw674UaRQE4Mcyd4EZtT5pYGLysjaqzXCQid2p8PFw== X-Received: by 2002:a05:6a00:1aca:b0:52f:55f8:c3ec with SMTP id f10-20020a056a001aca00b0052f55f8c3ecmr17343043pfv.25.1660586197377; Mon, 15 Aug 2022 10:56:37 -0700 (PDT) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id o5-20020a170902d4c500b0016d6963cb12sm7299935plg.304.2022.08.15.10.56.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2022 10:56:36 -0700 (PDT) Sender: Bobby Eshleman From: Bobby Eshleman X-Google-Original-From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Dexuan Cui , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org Subject: [PATCH 2/6] vsock: return errors other than -ENOMEM to socket Date: Mon, 15 Aug 2022 10:56:05 -0700 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This commit allows vsock implementations to return errors to the socket layer other than -ENOMEM. One immediate effect of this is that upon the sk_sndbuf threshold being reached -EAGAIN will be returned and userspace may throttle appropriately. Resultingly, a known issue with uperf is resolved[1]. Additionally, to preserve legacy behavior for non-virtio implementations, hyperv/vmci force errors to be -ENOMEM so that behavior is unchanged. [1]: https://gitlab.com/vsock/vsock/-/issues/1 Signed-off-by: Bobby Eshleman Reported-by: kernel test robot --- include/linux/virtio_vsock.h | 3 +++ net/vmw_vsock/af_vsock.c | 3 ++- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport_common.c | 3 --- net/vmw_vsock/vmci_transport.c | 9 ++++++++- 5 files changed, 14 insertions(+), 6 deletions(-) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 17ed01466875..9a37eddbb87a 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -8,6 +8,9 @@ #include #include =20 +/* Threshold for detecting small packets to copy */ +#define GOOD_COPY_LEN 128 + enum virtio_vsock_metadata_flags { VIRTIO_VSOCK_METADATA_FLAGS_REPLY =3D BIT(0), VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED =3D BIT(1), diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index e348b2d09eac..1893f8aafa48 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1844,8 +1844,9 @@ static int vsock_connectible_sendmsg(struct socket *s= ock, struct msghdr *msg, written =3D transport->stream_enqueue(vsk, msg, len - total_written); } + if (written < 0) { - err =3D -ENOMEM; + err =3D written; goto out_err; } =20 diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transp= ort.c index fd98229e3db3..e99aea571f6f 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -687,7 +687,7 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vs= k, struct msghdr *msg, if (bytes_written) ret =3D bytes_written; kfree(send_buf); - return ret; + return ret < 0 ? -ENOMEM : ret; } =20 static s64 hvs_stream_has_data(struct vsock_sock *vsk) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio= _transport_common.c index 920578597bb9..d5780599fe93 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -23,9 +23,6 @@ /* How long to wait for graceful shutdown of a connection */ #define VSOCK_CLOSE_TIMEOUT (8 * HZ) =20 -/* Threshold for detecting small packets to copy */ -#define GOOD_COPY_LEN 128 - static const struct virtio_transport * virtio_transport_get_ops(struct vsock_sock *vsk) { diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index b14f0ed7427b..c927a90dc859 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -1838,7 +1838,14 @@ static ssize_t vmci_transport_stream_enqueue( struct msghdr *msg, size_t len) { - return vmci_qpair_enquev(vmci_trans(vsk)->qpair, msg, len, 0); + int err; + + err =3D vmci_qpair_enquev(vmci_trans(vsk)->qpair, msg, len, 0); + + if (err < 0) + err =3D -ENOMEM; + + return err; } =20 static s64 vmci_transport_stream_has_data(struct vsock_sock *vsk) --=20 2.35.1 From nobody Sat Apr 11 05:14:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FC8FC00140 for ; Mon, 15 Aug 2022 17:57:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232589AbiHOR5R (ORCPT ); Mon, 15 Aug 2022 13:57:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiHOR4l (ORCPT ); Mon, 15 Aug 2022 13:56:41 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 516B828E08; Mon, 15 Aug 2022 10:56:40 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id 15-20020a17090a098f00b001f305b453feso15126485pjo.1; Mon, 15 Aug 2022 10:56:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc; bh=rVI5h2tUxFKga9bWIxpsjMtcdkELghEx/lzZDjCDCmU=; b=V4F7cz54xaVtkFf5KjsMX450mONLbBSGIGnPPdzAX6CT3d158cu5vAM7Sq2/nvm36R bIYXzA5GDboYEb4478gIImcxhn9zSWVrTUToJSe0DARvbFfZH+c7kgKQBIpt2n9kybyx zhgYAn1Ondcul6/+3PkckdDkfuD6PBtAnL4DSEPMq14ZKt/syWxPR7q++c5decLSWQpy UuxNk6reKtuy+wDIy/OnkmUpCb5HUidH9QdWcYBW2zXLlm3vdTEO6CqZIE+1KpdpBJ4p E+0kVVi562iHFKS+3nH4xBiARREJC4g2y0GDQ9nTzapZzZM7lTuycV2gfyBO+MuPXlAM DmVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc; bh=rVI5h2tUxFKga9bWIxpsjMtcdkELghEx/lzZDjCDCmU=; b=dxYRiFd07P833MJ3DxzzC9MUmfATnSQWgkC87XZ8mM3YkvdKl/ErL0hPudqWk9UBRU k8mOdhrF198LlQuxAsGfSolEjTMZLWqGXeV7yjh1He2ZIdDl3iDhbt6M9rvw8RXAXiK+ NEHfEfiRr/qQDWxfRC1Nj384/RKBcvnrKUmi0E4QO1ZvvKSiDC9IpIV5zzDnZTprjUAh fx2VjfVaJSHAC2e75ypdw825TPh97M4eip8zhIlqF6z2sbHm2f6juyBoJl2pCXiutFwb f7192R0TkBxlaSgAn2ml3nGobFYePBAy4rwfp2MSeyX3TLnoUGqEtYA9FyZ5XMkUCqPt w6Pg== X-Gm-Message-State: ACgBeo1QknPlYfjH1TlgJkjXDsaLyDaMDvuMw5WFEESAepsqqtQcMbve NRLHcE0B5H+ao1CDE+brjCA= X-Google-Smtp-Source: AA6agR6T6mzP/XLtL1oDvHxh62Ou3wCY7er5chHJonhWxdViF7Fel44ln+U7O5ZtVeBJ/GFKl0Kc5g== X-Received: by 2002:a17:902:b186:b0:172:728a:3636 with SMTP id s6-20020a170902b18600b00172728a3636mr4257264plr.14.1660586199388; Mon, 15 Aug 2022 10:56:39 -0700 (PDT) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id o5-20020a170902d4c500b0016d6963cb12sm7299935plg.304.2022.08.15.10.56.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2022 10:56:38 -0700 (PDT) Sender: Bobby Eshleman From: Bobby Eshleman X-Google-Original-From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock Date: Mon, 15 Aug 2022 10:56:06 -0700 Message-Id: <5a93c5aad99d79f028d349cb7e3c128c65d5d7e2.1660362668.git.bobby.eshleman@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In order to support usage of qdisc on vsock traffic, this commit introduces a struct net_device to vhost and virtio vsock. Two new devices are created, vhost-vsock for vhost and virtio-vsock for virtio. The devices are attached to the respective transports. To bypass the usage of the device, the user may "down" the associated network interface using common tools. For example, "ip link set dev virtio-vsock down" lets vsock bypass the net_device and qdisc entirely, simply using the FIFO logic of the prior implementation. For both hosts and guests, there is one device for all G2H vsock sockets and one device for all H2G vsock sockets. This makes sense for guests because the driver only supports a single vsock channel (one pair of TX/RX virtqueues), so one device and qdisc fits. For hosts, this may not seem ideal for some workloads. However, it is possible to use a multi-queue qdisc, where a given queue is responsible for a range of sockets. This seems to be a better solution than having one device per socket, which may yield a very large number of devices and qdiscs, all of which are dynamically being created and destroyed. Because of this dynamism, it would also require a complex policy management daemon, as devices would constantly be spun up and down as sockets were created and destroyed. To avoid this, one device and qdisc also applies to all H2G sockets. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 19 +++- include/linux/virtio_vsock.h | 10 +++ net/vmw_vsock/virtio_transport.c | 19 +++- net/vmw_vsock/virtio_transport_common.c | 112 +++++++++++++++++++++++- 4 files changed, 152 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index f8601d93d94d..b20ddec2664b 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -927,13 +927,30 @@ static int __init vhost_vsock_init(void) VSOCK_TRANSPORT_F_H2G); if (ret < 0) return ret; - return misc_register(&vhost_vsock_misc); + + ret =3D virtio_transport_init(&vhost_transport, "vhost-vsock"); + if (ret < 0) + goto out_unregister; + + ret =3D misc_register(&vhost_vsock_misc); + if (ret < 0) + goto out_transport_exit; + return ret; + +out_transport_exit: + virtio_transport_exit(&vhost_transport); + +out_unregister: + vsock_core_unregister(&vhost_transport.transport); + return ret; + }; =20 static void __exit vhost_vsock_exit(void) { misc_deregister(&vhost_vsock_misc); vsock_core_unregister(&vhost_transport.transport); + virtio_transport_exit(&vhost_transport); }; =20 module_init(vhost_vsock_init); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 9a37eddbb87a..5d7e7fbd75f8 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -91,10 +91,20 @@ struct virtio_transport { /* This must be the first field */ struct vsock_transport transport; =20 + /* Used almost exclusively for qdisc */ + struct net_device *dev; + /* Takes ownership of the packet */ int (*send_pkt)(struct sk_buff *skb); }; =20 +int +virtio_transport_init(struct virtio_transport *t, + const char *name); + +void +virtio_transport_exit(struct virtio_transport *t); + ssize_t virtio_transport_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg, diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transp= ort.c index 3bb293fd8607..c6212eb38d3c 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -131,7 +131,9 @@ virtio_transport_send_pkt_work(struct work_struct *work) * the vq */ if (ret < 0) { - skb_queue_head(&vsock->send_pkt_queue, skb); + spin_lock_bh(&vsock->send_pkt_queue.lock); + __skb_queue_head(&vsock->send_pkt_queue, skb); + spin_unlock_bh(&vsock->send_pkt_queue.lock); break; } =20 @@ -676,7 +678,9 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *v= sock) kfree_skb(skb); mutex_unlock(&vsock->tx_lock); =20 - skb_queue_purge(&vsock->send_pkt_queue); + spin_lock_bh(&vsock->send_pkt_queue.lock); + __skb_queue_purge(&vsock->send_pkt_queue); + spin_unlock_bh(&vsock->send_pkt_queue.lock); =20 /* Delete virtqueues and flush outstanding callbacks if any */ vdev->config->del_vqs(vdev); @@ -760,6 +764,8 @@ static void virtio_vsock_remove(struct virtio_device *v= dev) flush_work(&vsock->event_work); flush_work(&vsock->send_pkt_work); =20 + virtio_transport_exit(&virtio_transport); + mutex_unlock(&the_virtio_vsock_mutex); =20 kfree(vsock); @@ -844,12 +850,18 @@ static int __init virtio_vsock_init(void) if (ret) goto out_wq; =20 - ret =3D register_virtio_driver(&virtio_vsock_driver); + ret =3D virtio_transport_init(&virtio_transport, "virtio-vsock"); if (ret) goto out_vci; =20 + ret =3D register_virtio_driver(&virtio_vsock_driver); + if (ret) + goto out_transport; + return 0; =20 +out_transport: + virtio_transport_exit(&virtio_transport); out_vci: vsock_core_unregister(&virtio_transport.transport); out_wq: @@ -861,6 +873,7 @@ static void __exit virtio_vsock_exit(void) { unregister_virtio_driver(&virtio_vsock_driver); vsock_core_unregister(&virtio_transport.transport); + virtio_transport_exit(&virtio_transport); destroy_workqueue(virtio_vsock_workqueue); } =20 diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio= _transport_common.c index d5780599fe93..bdf16fff054f 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -16,6 +16,7 @@ =20 #include #include +#include =20 #define CREATE_TRACE_POINTS #include @@ -23,6 +24,93 @@ /* How long to wait for graceful shutdown of a connection */ #define VSOCK_CLOSE_TIMEOUT (8 * HZ) =20 +struct virtio_transport_priv { + struct virtio_transport *trans; +}; + +static netdev_tx_t virtio_transport_start_xmit(struct sk_buff *skb, struct= net_device *dev) +{ + struct virtio_transport *t =3D + ((struct virtio_transport_priv *)netdev_priv(dev))->trans; + int ret; + + ret =3D t->send_pkt(skb); + if (unlikely(ret =3D=3D -ENODEV)) + return NETDEV_TX_BUSY; + + return NETDEV_TX_OK; +} + +const struct net_device_ops virtio_transport_netdev_ops =3D { + .ndo_start_xmit =3D virtio_transport_start_xmit, +}; + +static void virtio_transport_setup(struct net_device *dev) +{ + dev->netdev_ops =3D &virtio_transport_netdev_ops; + dev->needs_free_netdev =3D true; + dev->flags =3D IFF_NOARP; + dev->mtu =3D VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; + dev->tx_queue_len =3D DEFAULT_TX_QUEUE_LEN; +} + +static int ifup(struct net_device *dev) +{ + int ret; + + rtnl_lock(); + ret =3D dev_open(dev, NULL) ? -ENOMEM : 0; + rtnl_unlock(); + + return ret; +} + +/* virtio_transport_init - initialize a virtio vsock transport layer + * + * @t: ptr to the virtio transport struct to initialize + * @name: the name of the net_device to be created. + * + * Return 0 on success, otherwise negative errno. + */ +int virtio_transport_init(struct virtio_transport *t, const char *name) +{ + struct virtio_transport_priv *priv; + int ret; + + t->dev =3D alloc_netdev(sizeof(*priv), name, NET_NAME_UNKNOWN, virtio_tra= nsport_setup); + if (!t->dev) + return -ENOMEM; + + priv =3D netdev_priv(t->dev); + priv->trans =3D t; + + ret =3D register_netdev(t->dev); + if (ret < 0) + goto out_free_netdev; + + ret =3D ifup(t->dev); + if (ret < 0) + goto out_unregister_netdev; + + return 0; + +out_unregister_netdev: + unregister_netdev(t->dev); + +out_free_netdev: + free_netdev(t->dev); + + return ret; +} + +void virtio_transport_exit(struct virtio_transport *t) +{ + if (t->dev) { + unregister_netdev(t->dev); + free_netdev(t->dev); + } +} + static const struct virtio_transport * virtio_transport_get_ops(struct vsock_sock *vsk) { @@ -147,6 +235,24 @@ static u16 virtio_transport_get_type(struct sock *sk) return VIRTIO_VSOCK_TYPE_SEQPACKET; } =20 +/* Return pkt->len on success, otherwise negative errno */ +static int virtio_transport_send_pkt(const struct virtio_transport *t, str= uct sk_buff *skb) +{ + int ret; + int len =3D skb->len; + + if (unlikely(!t->dev || !(t->dev->flags & IFF_UP))) + return t->send_pkt(skb); + + skb->dev =3D t->dev; + ret =3D dev_queue_xmit(skb); + + if (likely(ret =3D=3D NET_XMIT_SUCCESS || ret =3D=3D NET_XMIT_CN)) + return len; + + return -ENOMEM; +} + /* This function can only be used on connecting/connected sockets, * since a socket assigned to a transport is required. * @@ -202,9 +308,7 @@ static int virtio_transport_send_pkt_info(struct vsock_= sock *vsk, =20 virtio_transport_inc_tx_pkt(vvs, skb); =20 - err =3D t_ops->send_pkt(skb); - - return err < 0 ? -ENOMEM : err; + return virtio_transport_send_pkt(t_ops, skb); } =20 static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, @@ -834,7 +938,7 @@ static int virtio_transport_reset_no_sock(const struct = virtio_transport *t, return -ENOTCONN; } =20 - return t->send_pkt(reply); + return virtio_transport_send_pkt(t, reply); } =20 /* This function should be called with sk_lock held and SOCK_DONE set */ --=20 2.35.1 From nobody Sat Apr 11 05:14:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 102ECC00140 for ; Mon, 15 Aug 2022 17:57:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232545AbiHOR5K (ORCPT ); Mon, 15 Aug 2022 13:57:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231872AbiHOR4m (ORCPT ); Mon, 15 Aug 2022 13:56:42 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 880592873F; Mon, 15 Aug 2022 10:56:41 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id 130so6945759pfy.6; Mon, 15 Aug 2022 10:56:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc; bh=POGhXHHrShO+5snvawREHYn7DdRXAdPEiDut+UfqJD0=; b=TasIyn6wnSKhDedqQAakKj2PD4W9/10Vyl37s+zb0UMobZ/wmkdoU/H2JA9u0QpdEu p5EfHGDDmZ3qE1p1PkPcbmEiGRchBkEzqBm0Nu2OtkeqwQ3MLkwxwIDStDv0erGGXbJD vmhU/WtyrXbx7ABzKZQYk+1LJ0XMYZlNz4qCqfLijE3R+zo7Hvj+jFedrPbZN78UpG+M vP2VFOtJec3kqhex9HrGSKCS2VvuCD6red8xEnd7WKH9LvxFMjiyI3CiCuua6bEjAhr0 tAd+A5aaPFyiDVRa/+Ed7yVKSRI3C8Xo293R8ok5TdjtEHS0c/h9SbaaydzrRbXcV665 u+2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc; bh=POGhXHHrShO+5snvawREHYn7DdRXAdPEiDut+UfqJD0=; b=1ZjIGA7hkeO0Bb9p61asaiaStYGxdcti3O89diGsKtOkhqR/LngcGdC5rIw79P2pAa zXiWmGn5uaZrmpu2fpbS9Qq9M2pVorcQoZ9LMlHAutXCLX5VIGOi2bq4n+VNvLOniLTw 3IhSXmPoMbnsD7+vL1eVgo/3K4Cqp8miA9X061Vb1bjCKUpfJ3wMRNJ0ehJcKPgLfERS 7a9DKNzQoIOuBfI8hpUKr42xSM1WhbfOJcJ3L8847ItyaRDxTtVQKZW8n2zqiRDruqiD Sp0/uzqvYTx0NgsjAPMZmhuWaqigW3eIKWxwGKO0BDIwepgm9rPDcX5prQla0Tw/JmEO PpvQ== X-Gm-Message-State: ACgBeo0g/hgaEMtKScq73F2O6Hk0u4Vi5SresdZehpvgVcgnDJUf7vZt t/jXQzmocpMg9hKzEHNRSJo= X-Google-Smtp-Source: AA6agR6NQxRn6TVeJpksDrJwM7uImdLR6FRna64P7wOu9G/+nnSGJqrowCmXWEGx6RthjNMSrckjGg== X-Received: by 2002:a65:6a49:0:b0:429:88a0:4c04 with SMTP id o9-20020a656a49000000b0042988a04c04mr1887703pgu.566.1660586200956; Mon, 15 Aug 2022 10:56:40 -0700 (PDT) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id o5-20020a170902d4c500b0016d6963cb12sm7299935plg.304.2022.08.15.10.56.39 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2022 10:56:40 -0700 (PDT) Sender: Bobby Eshleman From: Bobby Eshleman X-Google-Original-From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 4/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Date: Mon, 15 Aug 2022 10:56:07 -0700 Message-Id: <3d1f32c4da81f8a0870e126369ba12bc8c4ad048.1660362668.git.bobby.eshleman@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This commit adds a feature bit for virtio vsock to support datagrams. Signed-off-by: Jiang Wang Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 3 ++- include/uapi/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport.c | 8 ++++++-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index b20ddec2664b..a5d1bdb786fe 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -32,7 +32,8 @@ enum { VHOST_VSOCK_FEATURES =3D VHOST_FEATURES | (1ULL << VIRTIO_F_ACCESS_PLATFORM) | - (1ULL << VIRTIO_VSOCK_F_SEQPACKET) + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) | + (1ULL << VIRTIO_VSOCK_F_DGRAM) }; =20 enum { diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_= vsock.h index 64738838bee5..857df3a3a70d 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -40,6 +40,7 @@ =20 /* The feature bitmap for virtio vsock */ #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */ +#define VIRTIO_VSOCK_F_DGRAM 2 /* Host support dgram vsock */ =20 struct virtio_vsock_config { __le64 guest_cid; diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transp= ort.c index c6212eb38d3c..073314312683 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -35,6 +35,7 @@ static struct virtio_transport virtio_transport; /* forwa= rd declaration */ struct virtio_vsock { struct virtio_device *vdev; struct virtqueue *vqs[VSOCK_VQ_MAX]; + bool has_dgram; =20 /* Virtqueue processing is deferred to a workqueue */ struct work_struct tx_work; @@ -709,7 +710,6 @@ static int virtio_vsock_probe(struct virtio_device *vde= v) } =20 vsock->vdev =3D vdev; - vsock->rx_buf_nr =3D 0; vsock->rx_buf_max_nr =3D 0; atomic_set(&vsock->queued_replies, 0); @@ -726,6 +726,9 @@ static int virtio_vsock_probe(struct virtio_device *vde= v) if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET)) vsock->seqpacket_allow =3D true; =20 + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM)) + vsock->has_dgram =3D true; + vdev->priv =3D vsock; =20 ret =3D virtio_vsock_vqs_init(vsock); @@ -820,7 +823,8 @@ static struct virtio_device_id id_table[] =3D { }; =20 static unsigned int features[] =3D { - VIRTIO_VSOCK_F_SEQPACKET + VIRTIO_VSOCK_F_SEQPACKET, + VIRTIO_VSOCK_F_DGRAM }; =20 static struct virtio_driver virtio_vsock_driver =3D { --=20 2.35.1 From nobody Sat Apr 11 05:14:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3490C00140 for ; Mon, 15 Aug 2022 17:57:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232435AbiHOR5G (ORCPT ); Mon, 15 Aug 2022 13:57:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232007AbiHOR4o (ORCPT ); Mon, 15 Aug 2022 13:56:44 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8485C28703; Mon, 15 Aug 2022 10:56:43 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id c19-20020a17090ae11300b001f2f94ed5c6so12121026pjz.1; Mon, 15 Aug 2022 10:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc; bh=ftWZSl7ksgvSB2phD6FMsosVD6s+qstCxLlPMYxAP8c=; b=IQcBDUIOs9WS0gRKrn/SCTaSdmBXbadl67lf3z3/li6WQsPlybd4i2CboFKTbskG1A /atiwFi2BsH6VD400NnOSDsX8egCPWa3oGhxXO6nOvZb2oq2adgCWDLaoRgbkVONhE1+ cTOeetB889l0zPwUkSec1W3cNuSJ7X9ftU7EUfrtw+/kESUP7LwFf1g23vXIvJ/Ph+II pVN4XsS1hwo84wADO1mrKS9hgE1bSNPibMSyl2FyZjVEodJA1w+CUE0I6vgqPKahH4H/ EKGsSjpaUNTvDlxSZe9jSOqF+ArqyjaCZPzxCPjXJew7sG9jpUUpQY7eABmVFs66d+6v /+mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc; bh=ftWZSl7ksgvSB2phD6FMsosVD6s+qstCxLlPMYxAP8c=; b=XBJq4eQ+nI/zFOFJBpNDkTPFkgQuTznU1GfyJJYIYPGPKcBGwBOLozSvDWm82MUbeZ XHHuGz6SyWRcgOy/UuwWwMSRh16e9dkW331RArGuhgBJ2fBgNVjY47P/JEJCrQ898drj EdePUBo41PuABU4xxRCqz/PKfCg6ITHqvxFYIrXxHs4jGDjK4fNr9f2S9Yk2snJwIqP0 9CuPBTCVxyKUfgTHVJSi2JlKW6kEkDn8X6PSgHdjtjMLTx6waIgsWL8PAJjLiNoR75Wl tdpvC3zIYaqMpokDCTvDe5dhqsF0jzhgBHzBEewsJtz29rGsTbxy4QL7y6VYqO7/pUj+ duQg== X-Gm-Message-State: ACgBeo2M3ZvRAClox6N8vSlojdhC1tCRzvQUfosbR6JXtwyQJFJ6v04n qBsobu2XkL0VMLfrwbmdZtY= X-Google-Smtp-Source: AA6agR5sXKrJUyLiP/XvolaIvcSPIPwL+KohOxci8Jj4HkvpIT6mnYIqW18H8/z0iDMl2E63TvAntw== X-Received: by 2002:a17:90a:2b42:b0:1f4:fc9a:be32 with SMTP id y2-20020a17090a2b4200b001f4fc9abe32mr28551802pjc.221.1660586202920; Mon, 15 Aug 2022 10:56:42 -0700 (PDT) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id o5-20020a170902d4c500b0016d6963cb12sm7299935plg.304.2022.08.15.10.56.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2022 10:56:42 -0700 (PDT) Sender: Bobby Eshleman From: Bobby Eshleman X-Google-Original-From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 5/6] virtio/vsock: add support for dgram Date: Mon, 15 Aug 2022 10:56:08 -0700 Message-Id: <3cb082f1c88f3f2ef1fc250dbc0745fb79c745c7.1660362668.git.bobby.eshleman@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch supports dgram in virtio and on the vhost side. Signed-off-by: Jiang Wang Signed-off-by: Bobby Eshleman Reported-by: kernel test robot --- drivers/vhost/vsock.c | 2 +- include/net/af_vsock.h | 2 + include/uapi/linux/virtio_vsock.h | 1 + net/vmw_vsock/af_vsock.c | 26 +++- net/vmw_vsock/virtio_transport.c | 2 +- net/vmw_vsock/virtio_transport_common.c | 173 ++++++++++++++++++++++-- 6 files changed, 186 insertions(+), 20 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index a5d1bdb786fe..3dc72a5647ca 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void) int ret; =20 ret =3D vsock_core_register(&vhost_transport.transport, - VSOCK_TRANSPORT_F_H2G); + VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM); if (ret < 0) return ret; =20 diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 1c53c4c4d88f..37e55c81e4df 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -78,6 +78,8 @@ struct vsock_sock { s64 vsock_stream_has_data(struct vsock_sock *vsk); s64 vsock_stream_has_space(struct vsock_sock *vsk); struct sock *vsock_create_connected(struct sock *parent); +int vsock_bind_stream(struct vsock_sock *vsk, + struct sockaddr_vm *addr); =20 /**** TRANSPORT ****/ =20 diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_= vsock.h index 857df3a3a70d..0975b9c88292 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -70,6 +70,7 @@ struct virtio_vsock_hdr { enum virtio_vsock_type { VIRTIO_VSOCK_TYPE_STREAM =3D 1, VIRTIO_VSOCK_TYPE_SEQPACKET =3D 2, + VIRTIO_VSOCK_TYPE_DGRAM =3D 3, }; =20 enum virtio_vsock_op { diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 1893f8aafa48..87e4ae1866d3 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct vsock_sock = *vsk, return 0; } =20 +int vsock_bind_stream(struct vsock_sock *vsk, + struct sockaddr_vm *addr) +{ + int retval; + + spin_lock_bh(&vsock_table_lock); + retval =3D __vsock_bind_connectible(vsk, addr); + spin_unlock_bh(&vsock_table_lock); + + return retval; +} +EXPORT_SYMBOL(vsock_bind_stream); + static int __vsock_bind_dgram(struct vsock_sock *vsk, struct sockaddr_vm *addr) { @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct vsock_transpor= t *t, int features) } =20 if (features & VSOCK_TRANSPORT_F_DGRAM) { - if (t_dgram) { - err =3D -EBUSY; - goto err_busy; + /* TODO: always chose the G2H variant over others, support nesting later= */ + if (features & VSOCK_TRANSPORT_F_G2H) { + if (t_dgram) + pr_warn("virtio_vsock: t_dgram already set\n"); + t_dgram =3D t; + } + + if (!t_dgram) { + t_dgram =3D t; } - t_dgram =3D t; } =20 if (features & VSOCK_TRANSPORT_F_LOCAL) { diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transp= ort.c index 073314312683..d4526ca462d2 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void) return -ENOMEM; =20 ret =3D vsock_core_register(&virtio_transport.transport, - VSOCK_TRANSPORT_F_G2H); + VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM); if (ret) goto out_wq; =20 diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio= _transport_common.c index bdf16fff054f..aedb48728677 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -229,7 +229,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt); =20 static u16 virtio_transport_get_type(struct sock *sk) { - if (sk->sk_type =3D=3D SOCK_STREAM) + if (sk->sk_type =3D=3D SOCK_DGRAM) + return VIRTIO_VSOCK_TYPE_DGRAM; + else if (sk->sk_type =3D=3D SOCK_STREAM) return VIRTIO_VSOCK_TYPE_STREAM; else return VIRTIO_VSOCK_TYPE_SEQPACKET; @@ -287,22 +289,29 @@ static int virtio_transport_send_pkt_info(struct vsoc= k_sock *vsk, vvs =3D vsk->trans; =20 /* we can send less than pkt_len bytes */ - if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) - pkt_len =3D VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { + if (info->type !=3D VIRTIO_VSOCK_TYPE_DGRAM) + pkt_len =3D VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; + else + return 0; + } =20 - /* virtio_transport_get_credit might return less than pkt_len credit */ - pkt_len =3D virtio_transport_get_credit(vvs, pkt_len); + if (info->type !=3D VIRTIO_VSOCK_TYPE_DGRAM) { + /* virtio_transport_get_credit might return less than pkt_len credit */ + pkt_len =3D virtio_transport_get_credit(vvs, pkt_len); =20 - /* Do not send zero length OP_RW pkt */ - if (pkt_len =3D=3D 0 && info->op =3D=3D VIRTIO_VSOCK_OP_RW) - return pkt_len; + /* Do not send zero length OP_RW pkt */ + if (pkt_len =3D=3D 0 && info->op =3D=3D VIRTIO_VSOCK_OP_RW) + return pkt_len; + } =20 skb =3D virtio_transport_alloc_skb(info, pkt_len, src_cid, src_port, dst_cid, dst_port, &err); if (!skb) { - virtio_transport_put_credit(vvs, pkt_len); + if (info->type !=3D VIRTIO_VSOCK_TYPE_DGRAM) + virtio_transport_put_credit(vvs, pkt_len); return err; } =20 @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *= vsk, } EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue); =20 +static ssize_t +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk, + struct msghdr *msg, size_t len) +{ + struct virtio_vsock_sock *vvs =3D vsk->trans; + struct sk_buff *skb; + size_t total =3D 0; + u32 free_space; + int err =3D -EFAULT; + + spin_lock_bh(&vvs->rx_lock); + if (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) { + skb =3D __skb_dequeue(&vvs->rx_queue); + + total =3D len; + if (total > skb->len - vsock_metadata(skb)->off) + total =3D skb->len - vsock_metadata(skb)->off; + else if (total < skb->len - vsock_metadata(skb)->off) + msg->msg_flags |=3D MSG_TRUNC; + + /* sk_lock is held by caller so no one else can dequeue. + * Unlock rx_lock since memcpy_to_msg() may sleep. + */ + spin_unlock_bh(&vvs->rx_lock); + + err =3D memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, total); + if (err) + return err; + + spin_lock_bh(&vvs->rx_lock); + + virtio_transport_dec_rx_pkt(vvs, skb); + consume_skb(skb); + } + + free_space =3D vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); + + spin_unlock_bh(&vvs->rx_lock); + + if (total > 0 && msg->msg_name) { + /* Provide the address of the sender. */ + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name); + + vsock_addr_init(vm_addr, le64_to_cpu(vsock_hdr(skb)->src_cid), + le32_to_cpu(vsock_hdr(skb)->src_port)); + msg->msg_namelen =3D sizeof(*vm_addr); + } + return total; +} + +static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk) +{ + return virtio_transport_stream_has_data(vsk); +} + int virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk, struct msghdr *msg, @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg, size_t len, int flags) { - return -EOPNOTSUPP; + struct sock *sk; + size_t err =3D 0; + long timeout; + + DEFINE_WAIT(wait); + + sk =3D &vsk->sk; + err =3D 0; + + if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK) + return -EOPNOTSUPP; + + lock_sock(sk); + + if (!len) + goto out; + + timeout =3D sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + + while (1) { + s64 ready; + + prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + ready =3D virtio_transport_dgram_has_data(vsk); + + if (ready =3D=3D 0) { + if (timeout =3D=3D 0) { + err =3D -EAGAIN; + finish_wait(sk_sleep(sk), &wait); + break; + } + + release_sock(sk); + timeout =3D schedule_timeout(timeout); + lock_sock(sk); + + if (signal_pending(current)) { + err =3D sock_intr_errno(timeout); + finish_wait(sk_sleep(sk), &wait); + break; + } else if (timeout =3D=3D 0) { + err =3D -EAGAIN; + finish_wait(sk_sleep(sk), &wait); + break; + } + } else { + finish_wait(sk_sleep(sk), &wait); + + if (ready < 0) { + err =3D -ENOMEM; + goto out; + } + + err =3D virtio_transport_dgram_do_dequeue(vsk, msg, len); + break; + } + } +out: + release_sock(sk); + return err; } EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue); =20 @@ -819,13 +942,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow); int virtio_transport_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr) { - return -EOPNOTSUPP; + return vsock_bind_stream(vsk, addr); } EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind); =20 bool virtio_transport_dgram_allow(u32 cid, u32 port) { - return false; + return true; } EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow); =20 @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, struct msghdr *msg, size_t dgram_len) { - return -EOPNOTSUPP; + struct virtio_vsock_pkt_info info =3D { + .op =3D VIRTIO_VSOCK_OP_RW, + .msg =3D msg, + .pkt_len =3D dgram_len, + .vsk =3D vsk, + .remote_cid =3D remote_addr->svm_cid, + .remote_port =3D remote_addr->svm_port, + }; + + return virtio_transport_send_pkt_info(vsk, &info); } EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue); =20 @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct sock *sk, struct virtio_vsock_hdr *hdr =3D vsock_hdr(skb); int err =3D 0; =20 + if (le16_to_cpu(vsock_hdr(skb)->type) =3D=3D VIRTIO_VSOCK_TYPE_DGRAM) { + virtio_transport_recv_enqueue(vsk, skb); + sk->sk_data_ready(sk); + return err; + } + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RW: virtio_transport_recv_enqueue(vsk, skb); @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock *sk, struct = sk_buff *skb, static bool virtio_transport_valid_type(u16 type) { return (type =3D=3D VIRTIO_VSOCK_TYPE_STREAM) || - (type =3D=3D VIRTIO_VSOCK_TYPE_SEQPACKET); + (type =3D=3D VIRTIO_VSOCK_TYPE_SEQPACKET) || + (type =3D=3D VIRTIO_VSOCK_TYPE_DGRAM); } =20 /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mut= ex @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct virtio_transpo= rt *t, goto free_pkt; } =20 + if (sk->sk_type =3D=3D SOCK_DGRAM) { + virtio_transport_recv_connected(sk, skb); + goto out; + } + space_available =3D virtio_transport_space_update(sk, skb); =20 /* Update CID in case it has changed after a transport reset event */ @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct virtio_transpor= t *t, break; } =20 +out: release_sock(sk); =20 /* Release refcnt obtained when we fetched this socket out of the --=20 2.35.1 From nobody Sat Apr 11 05:14:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C0ADC00140 for ; Mon, 15 Aug 2022 17:57:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231868AbiHOR46 (ORCPT ); Mon, 15 Aug 2022 13:56:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232119AbiHOR4q (ORCPT ); Mon, 15 Aug 2022 13:56:46 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3167B28710; Mon, 15 Aug 2022 10:56:45 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id a8so7542993pjg.5; Mon, 15 Aug 2022 10:56:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc; bh=evBXpseCv44V8dtEGpGZdRtOLqS1uCzMoDoctnxfZ+A=; b=Z511c8JJmwP9apx0qVn6c0qKgKcguer5xLrmov8owAzmBp3kqtfhXzQf6NIoBELoMm pG6QeARhWBSSSFGjRHzl+hQJbLMS+NsgbLk+tp22XSkUOR0f4LO4VbR6FHzSHhAGbOlR n9RQPw2G3W+v1hJ+QwCd6RyPuFbObm5NiaWAid1rQ0UrJ4+X3Di/Au+fbjCOHjY26gNL 07utfS4OqQ+TV9P+nU0CoPns4CWu3kQCurxbGcHZ5ohEEWqEA3Arf8yV1Uh2P5C84dUf XZCQGsmULagNT6CpMTNoloJB4NR1+k0nIxRMwfCH0fwAS4IlsvkQYFftgSVN3rRSu+oO FE+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc; bh=evBXpseCv44V8dtEGpGZdRtOLqS1uCzMoDoctnxfZ+A=; b=ZR+A7kl+wFFMY8jkWQqciwt7JpyufCZKTn3eZzdtTuIPW916cY8kfrvvVffcRDXjFp +PELDFCHxdRbk0q3MauMm/x2t+JjjWbR41EWymV6xSu1Ja7+MdVijfOb5BGrAzTRCp7k Rg5dJbdE9KiDAMjfeNKugLS7BQ+IIL0Nkh6kYg6+ch9l/x46r4VV/iXoYzX4pKg0DBFI VMkdpjG1r5v7k3GLrJhqK4z8R/8L5YkALj0qOd60fabaMaeyfE8AOjS9FiAQueaoB1Fd 8Wmj2/KF6Q4aoy2kh9urv7AQDYfu5pfI4KiTPDBvNi53AqQxLnkO4nCNqWrJ2Apca4+s sgxw== X-Gm-Message-State: ACgBeo3oezqcllTq3rmoCjTThwGWyhgNJ3Lv4N6bpPHsiAkP58eaRsy7 yHpMz9tlZv0eMH4om5bBANI= X-Google-Smtp-Source: AA6agR7CB8zjcAhbyVzdRKzf4UGJUcS7DuTjL3Rykn/Fk9pjwni6PSiEqT+tqw2X46m/0SRM59q5nw== X-Received: by 2002:a17:90b:390:b0:1f3:ee2:62a8 with SMTP id ga16-20020a17090b039000b001f30ee262a8mr28673308pjb.148.1660586204587; Mon, 15 Aug 2022 10:56:44 -0700 (PDT) Received: from C02G8BMUMD6R.bytedance.net (c-73-164-155-12.hsd1.wa.comcast.net. [73.164.155.12]) by smtp.gmail.com with ESMTPSA id o5-20020a170902d4c500b0016d6963cb12sm7299935plg.304.2022.08.15.10.56.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Aug 2022 10:56:44 -0700 (PDT) Sender: Bobby Eshleman From: Bobby Eshleman X-Google-Original-From: Bobby Eshleman Cc: Bobby Eshleman , Bobby Eshleman , Cong Wang , Jiang Wang , Stefano Garzarella , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 6/6] vsock_test: add tests for vsock dgram Date: Mon, 15 Aug 2022 10:56:09 -0700 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Jiang Wang Added test cases for vsock dgram types. Signed-off-by: Jiang Wang --- tools/testing/vsock/util.c | 105 +++++++++++++++++ tools/testing/vsock/util.h | 4 + tools/testing/vsock/vsock_test.c | 195 +++++++++++++++++++++++++++++++ 3 files changed, 304 insertions(+) diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 2acbb7703c6a..d2f5b223bf85 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -260,6 +260,57 @@ void send_byte(int fd, int expected_ret, int flags) } } =20 +/* Transmit one byte and check the return value. + * + * expected_ret: + * <0 Negative errno (for testing errors) + * 0 End-of-file + * 1 Success + */ +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int ex= pected_ret, + int flags) +{ + const uint8_t byte =3D 'A'; + ssize_t nwritten; + + timeout_begin(TIMEOUT); + do { + nwritten =3D sendto(fd, &byte, sizeof(byte), flags, dest_addr, + len); + timeout_check("write"); + } while (nwritten < 0 && errno =3D=3D EINTR); + timeout_end(); + + if (expected_ret < 0) { + if (nwritten !=3D -1) { + fprintf(stderr, "bogus sendto(2) return value %zd\n", + nwritten); + exit(EXIT_FAILURE); + } + if (errno !=3D -expected_ret) { + perror("write"); + exit(EXIT_FAILURE); + } + return; + } + + if (nwritten < 0) { + perror("write"); + exit(EXIT_FAILURE); + } + if (nwritten =3D=3D 0) { + if (expected_ret =3D=3D 0) + return; + + fprintf(stderr, "unexpected EOF while sending byte\n"); + exit(EXIT_FAILURE); + } + if (nwritten !=3D sizeof(byte)) { + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten); + exit(EXIT_FAILURE); + } +} + /* Receive one byte and check the return value. * * expected_ret: @@ -313,6 +364,60 @@ void recv_byte(int fd, int expected_ret, int flags) } } =20 +/* Receive one byte and check the return value. + * + * expected_ret: + * <0 Negative errno (for testing errors) + * 0 End-of-file + * 1 Success + */ +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen, + int expected_ret, int flags) +{ + uint8_t byte; + ssize_t nread; + + timeout_begin(TIMEOUT); + do { + nread =3D recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen); + timeout_check("read"); + } while (nread < 0 && errno =3D=3D EINTR); + timeout_end(); + + if (expected_ret < 0) { + if (nread !=3D -1) { + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", + nread); + exit(EXIT_FAILURE); + } + if (errno !=3D -expected_ret) { + perror("read"); + exit(EXIT_FAILURE); + } + return; + } + + if (nread < 0) { + perror("read"); + exit(EXIT_FAILURE); + } + if (nread =3D=3D 0) { + if (expected_ret =3D=3D 0) + return; + + fprintf(stderr, "unexpected EOF while receiving byte\n"); + exit(EXIT_FAILURE); + } + if (nread !=3D sizeof(byte)) { + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread); + exit(EXIT_FAILURE); + } + if (byte !=3D 'A') { + fprintf(stderr, "unexpected byte read %c\n", byte); + exit(EXIT_FAILURE); + } +} + /* Run test cases. The program terminates if a failure occurs. */ void run_tests(const struct test_case *test_cases, const struct test_opts *opts) diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h index a3375ad2fb7f..7213f2a51c1e 100644 --- a/tools/testing/vsock/util.h +++ b/tools/testing/vsock/util.h @@ -43,7 +43,11 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned in= t port, struct sockaddr_vm *clientaddrp); void vsock_wait_remote_close(int fd); void send_byte(int fd, int expected_ret, int flags); +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int ex= pected_ret, + int flags); void recv_byte(int fd, int expected_ret, int flags); +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen, + int expected_ret, int flags); void run_tests(const struct test_case *test_cases, const struct test_opts *opts); void list_tests(const struct test_case *test_cases); diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_t= est.c index dc577461afc2..640379f1b462 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -201,6 +201,115 @@ static void test_stream_server_close_server(const str= uct test_opts *opts) close(fd); } =20 +static void test_dgram_sendto_client(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr =3D { + .svm =3D { + .svm_family =3D AF_VSOCK, + .svm_port =3D 1234, + .svm_cid =3D opts->peer_cid, + }, + }; + int fd; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + fd =3D socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + close(fd); +} + +static void test_dgram_sendto_server(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr =3D { + .svm =3D { + .svm_family =3D AF_VSOCK, + .svm_port =3D 1234, + .svm_cid =3D VMADDR_CID_ANY, + }, + }; + int fd; + int len =3D sizeof(addr.sa); + + fd =3D socket(AF_VSOCK, SOCK_DGRAM, 0); + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + /* Notify the client that the server is ready */ + control_writeln("BIND"); + + recvfrom_byte(fd, &addr.sa, &len, 1, 0); + printf("got message from cid:%d, port %u ", addr.svm.svm_cid, + addr.svm.svm_port); + + /* Wait for the client to finish */ + control_expectln("DONE"); + + close(fd); +} + +static void test_dgram_connect_client(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr =3D { + .svm =3D { + .svm_family =3D AF_VSOCK, + .svm_port =3D 1234, + .svm_cid =3D opts->peer_cid, + }, + }; + int fd; + int ret; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + fd =3D socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fd < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + ret =3D connect(fd, &addr.sa, sizeof(addr.svm)); + if (ret < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + send_byte(fd, 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + close(fd); +} + +static void test_dgram_connect_server(const struct test_opts *opts) +{ + test_dgram_sendto_server(opts); +} + /* With the standard socket sizes, VMCI is able to support about 100 * concurrent stream connections. */ @@ -254,6 +363,77 @@ static void test_stream_multiconn_server(const struct = test_opts *opts) close(fds[i]); } =20 +static void test_dgram_multiconn_client(const struct test_opts *opts) +{ + int fds[MULTICONN_NFDS]; + int i; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr =3D { + .svm =3D { + .svm_family =3D AF_VSOCK, + .svm_port =3D 1234, + .svm_cid =3D opts->peer_cid, + }, + }; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + for (i =3D 0; i < MULTICONN_NFDS; i++) { + fds[i] =3D socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fds[i] < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + } + + for (i =3D 0; i < MULTICONN_NFDS; i++) + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + for (i =3D 0; i < MULTICONN_NFDS; i++) + close(fds[i]); +} + +static void test_dgram_multiconn_server(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr =3D { + .svm =3D { + .svm_family =3D AF_VSOCK, + .svm_port =3D 1234, + .svm_cid =3D VMADDR_CID_ANY, + }, + }; + int fd; + int len =3D sizeof(addr.sa); + int i; + + fd =3D socket(AF_VSOCK, SOCK_DGRAM, 0); + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + /* Notify the client that the server is ready */ + control_writeln("BIND"); + + for (i =3D 0; i < MULTICONN_NFDS; i++) + recvfrom_byte(fd, &addr.sa, &len, 1, 0); + + /* Wait for the client to finish */ + control_expectln("DONE"); + + close(fd); +} + static void test_stream_msg_peek_client(const struct test_opts *opts) { int fd; @@ -646,6 +826,21 @@ static struct test_case test_cases[] =3D { .run_client =3D test_seqpacket_invalid_rec_buffer_client, .run_server =3D test_seqpacket_invalid_rec_buffer_server, }, + { + .name =3D "SOCK_DGRAM client close", + .run_client =3D test_dgram_sendto_client, + .run_server =3D test_dgram_sendto_server, + }, + { + .name =3D "SOCK_DGRAM client connect", + .run_client =3D test_dgram_connect_client, + .run_server =3D test_dgram_connect_server, + }, + { + .name =3D "SOCK_DGRAM multiple connections", + .run_client =3D test_dgram_multiconn_client, + .run_server =3D test_dgram_multiconn_server, + }, {}, }; =20 --=20 2.35.1