drivers/vhost/vsock.c | 238 ++++---- include/linux/virtio_vsock.h | 73 ++- include/net/af_vsock.h | 2 + include/uapi/linux/virtio_vsock.h | 2 + net/vmw_vsock/af_vsock.c | 30 +- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport.c | 237 +++++--- net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- net/vmw_vsock/vmci_transport.c | 9 +- net/vmw_vsock/vsock_loopback.c | 51 +- tools/testing/vsock/util.c | 105 ++++ tools/testing/vsock/util.h | 4 + tools/testing/vsock/vsock_test.c | 195 ++++++ 13 files changed, 1176 insertions(+), 543 deletions(-)
Hey everybody,
This series introduces datagrams, packet scheduling, and sk_buff usage
to virtio vsock.
The usage of struct sk_buff benefits users by a) preparing vsock to use
other related systems that require sk_buff, such as sockmap and qdisc,
b) supporting basic congestion control via sock_alloc_send_skb, and c)
reducing copying when delivering packets to TAP.
The socket layer no longer forces errors to be -ENOMEM, as typically
userspace expects -EAGAIN when the sk_sndbuf threshold is reached and
messages are being sent with option MSG_DONTWAIT.
The datagram work is based off previous patches by Jiang Wang[1].
The introduction of datagrams creates a transport layer fairness issue
where datagrams may freely starve streams of queue access. This happens
because, unlike streams, datagrams lack the transactions necessary for
calculating credits and throttling.
Previous proposals introduce changes to the spec to add an additional
virtqueue pair for datagrams[1]. Although this solution works, using
Linux's qdisc for packet scheduling leverages already existing systems,
avoids the need to change the virtio specification, and gives additional
capabilities. The usage of SFQ or fq_codel, for example, may solve the
transport layer starvation problem. It is easy to imagine other use
cases as well. For example, services of varying importance may be
assigned different priorities, and qdisc will apply appropriate
priority-based scheduling. By default, the system default pfifo qdisc is
used. The qdisc may be bypassed and legacy queuing is resumed by simply
setting the virtio-vsock%d network device to state DOWN. This technique
still allows vsock to work with zero-configuration.
In summary, this series introduces these major changes to vsock:
- virtio vsock supports datagrams
- virtio vsock uses struct sk_buff instead of virtio_vsock_pkt
- Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb,
which applies the throttling threshold sk_sndbuf.
- The vsock socket layer supports returning errors other than -ENOMEM.
- This is used to return -EAGAIN when the sk_sndbuf threshold is
reached.
- virtio vsock uses a net_device, through which qdisc may be used.
- qdisc allows scheduling policies to be applied to vsock flows.
- Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is,
it may avoid datagrams from flooding out stream flows. The benefit
to this is that additional virtqueues are not needed for datagrams.
- The net_device and qdisc is bypassed by simply setting the
net_device state to DOWN.
[1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/
Bobby Eshleman (5):
vsock: replace virtio_vsock_pkt with sk_buff
vsock: return errors other than -ENOMEM to socket
vsock: add netdev to vhost/virtio vsock
virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
virtio/vsock: add support for dgram
Jiang Wang (1):
vsock_test: add tests for vsock dgram
drivers/vhost/vsock.c | 238 ++++----
include/linux/virtio_vsock.h | 73 ++-
include/net/af_vsock.h | 2 +
include/uapi/linux/virtio_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 30 +-
net/vmw_vsock/hyperv_transport.c | 2 +-
net/vmw_vsock/virtio_transport.c | 237 +++++---
net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++--------
net/vmw_vsock/vmci_transport.c | 9 +-
net/vmw_vsock/vsock_loopback.c | 51 +-
tools/testing/vsock/util.c | 105 ++++
tools/testing/vsock/util.h | 4 +
tools/testing/vsock/vsock_test.c | 195 ++++++
13 files changed, 1176 insertions(+), 543 deletions(-)
--
2.35.1
Hi, On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: >Hey everybody, > >This series introduces datagrams, packet scheduling, and sk_buff usage >to virtio vsock. Just a reminder for those who are interested, tomorrow Sep 27 @ 16:00 UTC we will discuss more about the next steps for this series in this room: https://meet.google.com/fxi-vuzr-jjb (I'll try to record it and take notes that we will share) Bobby, thank you so much for working on this! It would be great to solve the fairness issue and support datagram! I took a look at the series, left some comments in the individual patches, and add some advice here that we could pick up tomorrow: - it would be nice to run benchmarks (e.g., iperf-vsock, uperf, etc.) to see how much the changes cost (e.g. sk_buff use) - we should take care also of other transports (i.e. vmci, hyperv), the uAPI should be as close as possible regardless of the transport About the use of netdev, it seems the most controversial point and I understand Jakub and Michael's concerns. Tomorrow would be great if you can update us if you have found any way to avoid it, just reusing a packet scheduler somehow. It would be great if we could make it available for all transports (I'm not asking you to implement it for all, but to have a generic api that others can use). But we can talk about that tomorrow! Thanks, Stefano > >The usage of struct sk_buff benefits users by a) preparing vsock to use >other related systems that require sk_buff, such as sockmap and qdisc, >b) supporting basic congestion control via sock_alloc_send_skb, and c) >reducing copying when delivering packets to TAP. > >The socket layer no longer forces errors to be -ENOMEM, as typically >userspace expects -EAGAIN when the sk_sndbuf threshold )s reached and >messages are being sent with option MSG_DONTWAIT. > >The datagram work is based off previous patches by Jiang Wang[1]. > >The introduction of datagrams creates a transport layer fairness issue >where datagrams may freely starve streams of queue access. This happens >because, unlike streams, datagrams lack the transactions necessary for >calculating credits and throttling. > >Previous proposals introduce changes to the spec to add an additional >virtqueue pair for datagrams[1]. Although this solution works, using >Linux's qdisc for packet scheduling leverages already existing systems, >avoids the need to change the virtio specification, and gives additional >capabilities. The usage of SFQ or fq_codel, for example, may solve the >transport layer starvation problem. It is easy to imagine other use >cases as well. For example, services of varying importance may be >assigned different priorities, and qdisc will apply appropriate >priority-based scheduling. By default, the system default pfifo qdisc is >used. The qdisc may be bypassed and legacy queuing is resumed by simply >setting the virtio-vsock%d network device to state DOWN. This technique >still allows vsock to work with zero-configuration. > >In summary, this series introduces these major changes to vsock: > >- virtio vsock supports datagrams >- virtio vsock uses struct sk_buff instead of virtio_vsock_pkt > - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb, > which applies the throttling threshold sk_sndbuf. >- The vsock socket layer supports returning errors other than -ENOMEM. > - This is used to return -EAGAIN when the sk_sndbuf threshold is > reached. >- virtio vsock uses a net_device, through which qdisc may be used. > - qdisc allows scheduling policies to be applied to vsock flows. > - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is, > it may avoid datagrams from flooding out stream flows. The benefit > to this is that additional virtqueues are not needed for datagrams. > - The net_device and qdisc is bypassed by simply setting the > net_device state to DOWN. > >[1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/ > >Bobby Eshleman (5): > vsock: replace virtio_vsock_pkt with sk_buff > vsock: return errors other than -ENOMEM to socket > vsock: add netdev to vhost/virtio vsock > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit > virtio/vsock: add support for dgram > >Jiang Wang (1): > vsock_test: add tests for vsock dgram > > drivers/vhost/vsock.c | 238 ++++---- > include/linux/virtio_vsock.h | 73 ++- > include/net/af_vsock.h | 2 + > include/uapi/linux/virtio_vsock.h | 2 + > net/vmw_vsock/af_vsock.c | 30 +- > net/vmw_vsock/hyperv_transport.c | 2 +- > net/vmw_vsock/virtio_transport.c | 237 +++++--- > net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- > net/vmw_vsock/vmci_transport.c | 9 +- > net/vmw_vsock/vsock_loopback.c | 51 +- > tools/testing/vsock/util.c | 105 ++++ > tools/testing/vsock/util.h | 4 + > tools/testing/vsock/vsock_test.c | 195 ++++++ > 13 files changed, 1176 insertions(+), 543 deletions(-) > >-- >2.35.1 >
On Mon, Sep 26, 2022 at 03:42:19PM +0200, Stefano Garzarella wrote: >Hi, > >On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: >>Hey everybody, >> >>This series introduces datagrams, packet scheduling, and sk_buff usage >>to virtio vsock. > >Just a reminder for those who are interested, tomorrow Sep 27 @ 16:00 >UTC we will discuss more about the next steps for this series in this >room: https://meet.google.com/fxi-vuzr-jjb >(I'll try to record it and take notes that we will share) > Thank you all for participating in the call! I'm attaching video/audio recording and notes (feel free to update it). Notes: https://docs.google.com/document/d/14UHH0tEaBKfElLZjNkyKUs_HnOgHhZZBqIS86VEIqR0/edit?usp=sharing Video recording: https://drive.google.com/file/d/1vUvTc_aiE1mB30tLPeJjANnb915-CIKa/view?usp=sharing Thanks, Stefano
On Mon, Sep 26, 2022 at 03:42:19PM +0200, Stefano Garzarella wrote: > Hi, > > On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: > > Hey everybody, > > > > This series introduces datagrams, packet scheduling, and sk_buff usage > > to virtio vsock. > > Just a reminder for those who are interested, tomorrow Sep 27 @ 16:00 UTC we > will discuss more about the next steps for this series in this room: > https://meet.google.com/fxi-vuzr-jjb > (I'll try to record it and take notes that we will share) > > Bobby, thank you so much for working on this! It would be great to solve the > fairness issue and support datagram! > I appreciate that, thanks! > I took a look at the series, left some comments in the individual patches, > and add some advice here that we could pick up tomorrow: > - it would be nice to run benchmarks (e.g., iperf-vsock, uperf, etc.) to > see how much the changes cost (e.g. sk_buff use) > - we should take care also of other transports (i.e. vmci, hyperv), the > uAPI should be as close as possible regardless of the transport > Duly noted. I have some measurements with uperf, I'll put the data together and send that out here. Regarding the uAPI topic, I'll save that topic for our conversation tomorrow as I think the netdev topic will weigh on it. > About the use of netdev, it seems the most controversial point and I > understand Jakub and Michael's concerns. Tomorrow would be great if you can > update us if you have found any way to avoid it, just reusing a packet > scheduler somehow. > It would be great if we could make it available for all transports (I'm not > asking you to implement it for all, but to have a generic api that others > can use). > > But we can talk about that tomorrow! Sounds good, talk to you then! Best, Bobby
Hi Bobby, If you are attending Linux Foundation conferences in Dublin, Ireland next week (Linux Plumbers Conference, Open Source Summit Europe, KVM Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet Stefano Garzarella and others to discuss this patch series. Using netdev and sk_buff is a big change to vsock. Discussing your requirements and the future direction of vsock in person could help. If you won't be in Dublin, don't worry. You can schedule a video call if you feel it would be helpful to discuss these topics. Stefan
On Tue, Sep 06, 2022 at 09:26:33AM -0400, Stefan Hajnoczi wrote: > Hi Bobby, > If you are attending Linux Foundation conferences in Dublin, Ireland > next week (Linux Plumbers Conference, Open Source Summit Europe, KVM > Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet > Stefano Garzarella and others to discuss this patch series. > > Using netdev and sk_buff is a big change to vsock. Discussing your > requirements and the future direction of vsock in person could help. > > If you won't be in Dublin, don't worry. You can schedule a video call if > you feel it would be helpful to discuss these topics. > > Stefan Hey Stefan, That sounds like a great idea! I was unable to make the Dublin trip work so I think a video call would be best, of course if okay with everyone. Thanks, Bobby
On Thu, Aug 18, 2022 at 02:39:32PM +0000, Bobby Eshleman wrote: >On Tue, Sep 06, 2022 at 09:26:33AM -0400, Stefan Hajnoczi wrote: >> Hi Bobby, >> If you are attending Linux Foundation conferences in Dublin, Ireland >> next week (Linux Plumbers Conference, Open Source Summit Europe, KVM >> Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet >> Stefano Garzarella and others to discuss this patch series. >> >> Using netdev and sk_buff is a big change to vsock. Discussing your >> requirements and the future direction of vsock in person could help. >> >> If you won't be in Dublin, don't worry. You can schedule a video call if >> you feel it would be helpful to discuss these topics. >> >> Stefan > >Hey Stefan, > >That sounds like a great idea! Yep, I agree! >I was unable to make the Dublin trip work >so I think a video call would be best, of course if okay with everyone. It will work for me, but I'll be a bit busy in the next 2 weeks: From Sep 12 to Sep 14 I'll be at KVM Forum, so it may be difficult to arrange, but we can try. Sep 15 I'm not available. Sep 16 I'm traveling, but early in my morning, so I should be available. Form Sep 10 to Sep 23 I'll be mostly off, but I can try to find some slots if needed. From Sep 26 I'm back and fully available. Let's see if others are available and try to find a slot :-) Thanks, Stefano
On Thu, Aug 18, 2022 at 02:39:32PM +0000, Bobby Eshleman wrote: >On Tue, Sep 06, 2022 at 09:26:33AM -0400, Stefan Hajnoczi wrote: >> Hi Bobby, >> If you are attending Linux Foundation conferences in Dublin, Ireland >> next week (Linux Plumbers Conference, Open Source Summit Europe, KVM >> Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet >> Stefano Garzarella and others to discuss this patch series. >> >> Using netdev and sk_buff is a big change to vsock. Discussing your >> requirements and the future direction of vsock in person could help. >> >> If you won't be in Dublin, don't worry. You can schedule a video call if >> you feel it would be helpful to discuss these topics. >> >> Stefan > >Hey Stefan, > >That sounds like a great idea! I was unable to make the Dublin trip work >so I think a video call would be best, of course if okay with everyone. Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 16:30 UTC. Could this work for you? It would be nice to also have HyperV and VMCI people in the call and anyone else who is interested of course. @Dexuan @Bryan @Vishnu can you attend? @MST @Jason @Stefan if you can be there that would be great, we could connect together from Dublin. Thanks, Stefano
Hey Stefano, thanks for sending this out. On Thu, Sep 08, 2022 at 04:36:52PM +0200, Stefano Garzarella wrote: > > Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 16:30 > UTC. > > Could this work for you? Unfortunately, I can't make this time slot. My schedule also opens up a lot the week of the 26th, especially between 16:00 and 19:00 UTC, as well as after 22:00 UTC. Best, Bobby
On Fri, Sep 9, 2022 at 8:13 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > Hey Stefano, thanks for sending this out. > > On Thu, Sep 08, 2022 at 04:36:52PM +0200, Stefano Garzarella wrote: > > > > Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 16:30 > > UTC. > > > > Could this work for you? > > Unfortunately, I can't make this time slot. No problem at all! > > My schedule also opens up a lot the week of the 26th, especially between > 16:00 and 19:00 UTC, as well as after 22:00 UTC. Great, that week works for me too. What about Sep 27 @ 16:00 UTC? Thanks, Stefano
On Mon, Sep 12, 2022 at 08:12:58PM +0200, Stefano Garzarella wrote: > On Fri, Sep 9, 2022 at 8:13 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > > > Hey Stefano, thanks for sending this out. > > > > On Thu, Sep 08, 2022 at 04:36:52PM +0200, Stefano Garzarella wrote: > > > > > > Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 16:30 > > > UTC. > > > > > > Could this work for you? > > > > Unfortunately, I can't make this time slot. > > No problem at all! > > > > > My schedule also opens up a lot the week of the 26th, especially between > > 16:00 and 19:00 UTC, as well as after 22:00 UTC. > > Great, that week works for me too. > What about Sep 27 @ 16:00 UTC? > That time works for me! Thanks, Bobby
On Mon, Sep 12, 2022 at 8:28 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > On Mon, Sep 12, 2022 at 08:12:58PM +0200, Stefano Garzarella wrote: > > On Fri, Sep 9, 2022 at 8:13 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > > > > > Hey Stefano, thanks for sending this out. > > > > > > On Thu, Sep 08, 2022 at 04:36:52PM +0200, Stefano Garzarella wrote: > > > > > > > > Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 16:30 > > > > UTC. > > > > > > > > Could this work for you? > > > > > > Unfortunately, I can't make this time slot. > > > > No problem at all! > > > > > > > > My schedule also opens up a lot the week of the 26th, especially between > > > 16:00 and 19:00 UTC, as well as after 22:00 UTC. > > > > Great, that week works for me too. > > What about Sep 27 @ 16:00 UTC? > > > > That time works for me! Great! I sent you an invitation. For others that want to join the discussion, we will meet Sep 27 @ 16:00 UTC at this room: https://meet.google.com/fxi-vuzr-jjb Thanks, Stefano
On Fri, Sep 16, 2022 at 05:51:22AM +0200, Stefano Garzarella wrote: > On Mon, Sep 12, 2022 at 8:28 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > > > On Mon, Sep 12, 2022 at 08:12:58PM +0200, Stefano Garzarella wrote: > > > On Fri, Sep 9, 2022 at 8:13 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > > > > > > > Hey Stefano, thanks for sending this out. > > > > > > > > On Thu, Sep 08, 2022 at 04:36:52PM +0200, Stefano Garzarella wrote: > > > > > > > > > > Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 16:30 > > > > > UTC. > > > > > > > > > > Could this work for you? > > > > > > > > Unfortunately, I can't make this time slot. > > > > > > No problem at all! > > > > > > > > > > > My schedule also opens up a lot the week of the 26th, especially between > > > > 16:00 and 19:00 UTC, as well as after 22:00 UTC. > > > > > > Great, that week works for me too. > > > What about Sep 27 @ 16:00 UTC? > > > > > > > That time works for me! > > Great! I sent you an invitation. > Awesome, see you then! Thanks, Bobby
On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: > Hey everybody, > > This series introduces datagrams, packet scheduling, and sk_buff usage > to virtio vsock. > > The usage of struct sk_buff benefits users by a) preparing vsock to use > other related systems that require sk_buff, such as sockmap and qdisc, > b) supporting basic congestion control via sock_alloc_send_skb, and c) > reducing copying when delivering packets to TAP. > > The socket layer no longer forces errors to be -ENOMEM, as typically > userspace expects -EAGAIN when the sk_sndbuf threshold is reached and > messages are being sent with option MSG_DONTWAIT. > > The datagram work is based off previous patches by Jiang Wang[1]. > > The introduction of datagrams creates a transport layer fairness issue > where datagrams may freely starve streams of queue access. This happens > because, unlike streams, datagrams lack the transactions necessary for > calculating credits and throttling. > > Previous proposals introduce changes to the spec to add an additional > virtqueue pair for datagrams[1]. Although this solution works, using > Linux's qdisc for packet scheduling leverages already existing systems, > avoids the need to change the virtio specification, and gives additional > capabilities. The usage of SFQ or fq_codel, for example, may solve the > transport layer starvation problem. It is easy to imagine other use > cases as well. For example, services of varying importance may be > assigned different priorities, and qdisc will apply appropriate > priority-based scheduling. By default, the system default pfifo qdisc is > used. The qdisc may be bypassed and legacy queuing is resumed by simply > setting the virtio-vsock%d network device to state DOWN. This technique > still allows vsock to work with zero-configuration. The basic question to answer then is this: with a net device qdisc etc in the picture, how is this different from virtio net then? Why do you still want to use vsock? > In summary, this series introduces these major changes to vsock: > > - virtio vsock supports datagrams > - virtio vsock uses struct sk_buff instead of virtio_vsock_pkt > - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb, > which applies the throttling threshold sk_sndbuf. > - The vsock socket layer supports returning errors other than -ENOMEM. > - This is used to return -EAGAIN when the sk_sndbuf threshold is > reached. > - virtio vsock uses a net_device, through which qdisc may be used. > - qdisc allows scheduling policies to be applied to vsock flows. > - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is, > it may avoid datagrams from flooding out stream flows. The benefit > to this is that additional virtqueues are not needed for datagrams. > - The net_device and qdisc is bypassed by simply setting the > net_device state to DOWN. > > [1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/ > > Bobby Eshleman (5): > vsock: replace virtio_vsock_pkt with sk_buff > vsock: return errors other than -ENOMEM to socket > vsock: add netdev to vhost/virtio vsock > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit > virtio/vsock: add support for dgram > > Jiang Wang (1): > vsock_test: add tests for vsock dgram > > drivers/vhost/vsock.c | 238 ++++---- > include/linux/virtio_vsock.h | 73 ++- > include/net/af_vsock.h | 2 + > include/uapi/linux/virtio_vsock.h | 2 + > net/vmw_vsock/af_vsock.c | 30 +- > net/vmw_vsock/hyperv_transport.c | 2 +- > net/vmw_vsock/virtio_transport.c | 237 +++++--- > net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- > net/vmw_vsock/vmci_transport.c | 9 +- > net/vmw_vsock/vsock_loopback.c | 51 +- > tools/testing/vsock/util.c | 105 ++++ > tools/testing/vsock/util.h | 4 + > tools/testing/vsock/vsock_test.c | 195 ++++++ > 13 files changed, 1176 insertions(+), 543 deletions(-) > > -- > 2.35.1
在 2022/8/17 14:54, Michael S. Tsirkin 写道: > On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: >> Hey everybody, >> >> This series introduces datagrams, packet scheduling, and sk_buff usage >> to virtio vsock. >> >> The usage of struct sk_buff benefits users by a) preparing vsock to use >> other related systems that require sk_buff, such as sockmap and qdisc, >> b) supporting basic congestion control via sock_alloc_send_skb, and c) >> reducing copying when delivering packets to TAP. >> >> The socket layer no longer forces errors to be -ENOMEM, as typically >> userspace expects -EAGAIN when the sk_sndbuf threshold is reached and >> messages are being sent with option MSG_DONTWAIT. >> >> The datagram work is based off previous patches by Jiang Wang[1]. >> >> The introduction of datagrams creates a transport layer fairness issue >> where datagrams may freely starve streams of queue access. This happens >> because, unlike streams, datagrams lack the transactions necessary for >> calculating credits and throttling. >> >> Previous proposals introduce changes to the spec to add an additional >> virtqueue pair for datagrams[1]. Although this solution works, using >> Linux's qdisc for packet scheduling leverages already existing systems, >> avoids the need to change the virtio specification, and gives additional >> capabilities. The usage of SFQ or fq_codel, for example, may solve the >> transport layer starvation problem. It is easy to imagine other use >> cases as well. For example, services of varying importance may be >> assigned different priorities, and qdisc will apply appropriate >> priority-based scheduling. By default, the system default pfifo qdisc is >> used. The qdisc may be bypassed and legacy queuing is resumed by simply >> setting the virtio-vsock%d network device to state DOWN. This technique >> still allows vsock to work with zero-configuration. > The basic question to answer then is this: with a net device qdisc > etc in the picture, how is this different from virtio net then? > Why do you still want to use vsock? Or maybe it's time to revisit an old idea[1] to unify at least the driver part (e.g using virtio-net driver for vsock then we can all features that vsock is lacking now)? Thanks [1] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039783.html > >> In summary, this series introduces these major changes to vsock: >> >> - virtio vsock supports datagrams >> - virtio vsock uses struct sk_buff instead of virtio_vsock_pkt >> - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb, >> which applies the throttling threshold sk_sndbuf. >> - The vsock socket layer supports returning errors other than -ENOMEM. >> - This is used to return -EAGAIN when the sk_sndbuf threshold is >> reached. >> - virtio vsock uses a net_device, through which qdisc may be used. >> - qdisc allows scheduling policies to be applied to vsock flows. >> - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is, >> it may avoid datagrams from flooding out stream flows. The benefit >> to this is that additional virtqueues are not needed for datagrams. >> - The net_device and qdisc is bypassed by simply setting the >> net_device state to DOWN. >> >> [1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/ >> >> Bobby Eshleman (5): >> vsock: replace virtio_vsock_pkt with sk_buff >> vsock: return errors other than -ENOMEM to socket >> vsock: add netdev to vhost/virtio vsock >> virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit >> virtio/vsock: add support for dgram >> >> Jiang Wang (1): >> vsock_test: add tests for vsock dgram >> >> drivers/vhost/vsock.c | 238 ++++---- >> include/linux/virtio_vsock.h | 73 ++- >> include/net/af_vsock.h | 2 + >> include/uapi/linux/virtio_vsock.h | 2 + >> net/vmw_vsock/af_vsock.c | 30 +- >> net/vmw_vsock/hyperv_transport.c | 2 +- >> net/vmw_vsock/virtio_transport.c | 237 +++++--- >> net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- >> net/vmw_vsock/vmci_transport.c | 9 +- >> net/vmw_vsock/vsock_loopback.c | 51 +- >> tools/testing/vsock/util.c | 105 ++++ >> tools/testing/vsock/util.h | 4 + >> tools/testing/vsock/vsock_test.c | 195 ++++++ >> 13 files changed, 1176 insertions(+), 543 deletions(-) >> >> -- >> 2.35.1
On Thu, Aug 18, 2022 at 12:28:48PM +0800, Jason Wang wrote: > >在 2022/8/17 14:54, Michael S. Tsirkin 写道: >>On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: >>>Hey everybody, >>> >>>This series introduces datagrams, packet scheduling, and sk_buff usage >>>to virtio vsock. >>> >>>The usage of struct sk_buff benefits users by a) preparing vsock to use >>>other related systems that require sk_buff, such as sockmap and qdisc, >>>b) supporting basic congestion control via sock_alloc_send_skb, and c) >>>reducing copying when delivering packets to TAP. >>> >>>The socket layer no longer forces errors to be -ENOMEM, as typically >>>userspace expects -EAGAIN when the sk_sndbuf threshold is reached and >>>messages are being sent with option MSG_DONTWAIT. >>> >>>The datagram work is based off previous patches by Jiang Wang[1]. >>> >>>The introduction of datagrams creates a transport layer fairness issue >>>where datagrams may freely starve streams of queue access. This happens >>>because, unlike streams, datagrams lack the transactions necessary for >>>calculating credits and throttling. >>> >>>Previous proposals introduce changes to the spec to add an additional >>>virtqueue pair for datagrams[1]. Although this solution works, using >>>Linux's qdisc for packet scheduling leverages already existing systems, >>>avoids the need to change the virtio specification, and gives additional >>>capabilities. The usage of SFQ or fq_codel, for example, may solve the >>>transport layer starvation problem. It is easy to imagine other use >>>cases as well. For example, services of varying importance may be >>>assigned different priorities, and qdisc will apply appropriate >>>priority-based scheduling. By default, the system default pfifo qdisc is >>>used. The qdisc may be bypassed and legacy queuing is resumed by simply >>>setting the virtio-vsock%d network device to state DOWN. This technique >>>still allows vsock to work with zero-configuration. >>The basic question to answer then is this: with a net device qdisc >>etc in the picture, how is this different from virtio net then? >>Why do you still want to use vsock? > > >Or maybe it's time to revisit an old idea[1] to unify at least the >driver part (e.g using virtio-net driver for vsock then we can all >features that vsock is lacking now)? Sorry for coming late to the discussion! This would be great, though, last time I had looked at it, I had found it quite complicated. The main problem is trying to avoid all the net-specific stuff (MTU, ethernet header, HW offloading, etc.). Maybe we could start thinking about this idea by adding a new transport to vsock (e.g. virtio-net-vsock) completely separate from what we have now. Thanks, Stefano
On Wed, Aug 17, 2022 at 02:54:33AM -0400, Michael S. Tsirkin wrote: > On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: > > Hey everybody, > > > > This series introduces datagrams, packet scheduling, and sk_buff usage > > to virtio vsock. > > > > The usage of struct sk_buff benefits users by a) preparing vsock to use > > other related systems that require sk_buff, such as sockmap and qdisc, > > b) supporting basic congestion control via sock_alloc_send_skb, and c) > > reducing copying when delivering packets to TAP. > > > > The socket layer no longer forces errors to be -ENOMEM, as typically > > userspace expects -EAGAIN when the sk_sndbuf threshold is reached and > > messages are being sent with option MSG_DONTWAIT. > > > > The datagram work is based off previous patches by Jiang Wang[1]. > > > > The introduction of datagrams creates a transport layer fairness issue > > where datagrams may freely starve streams of queue access. This happens > > because, unlike streams, datagrams lack the transactions necessary for > > calculating credits and throttling. > > > > Previous proposals introduce changes to the spec to add an additional > > virtqueue pair for datagrams[1]. Although this solution works, using > > Linux's qdisc for packet scheduling leverages already existing systems, > > avoids the need to change the virtio specification, and gives additional > > capabilities. The usage of SFQ or fq_codel, for example, may solve the > > transport layer starvation problem. It is easy to imagine other use > > cases as well. For example, services of varying importance may be > > assigned different priorities, and qdisc will apply appropriate > > priority-based scheduling. By default, the system default pfifo qdisc is > > used. The qdisc may be bypassed and legacy queuing is resumed by simply > > setting the virtio-vsock%d network device to state DOWN. This technique > > still allows vsock to work with zero-configuration. > > The basic question to answer then is this: with a net device qdisc > etc in the picture, how is this different from virtio net then? > Why do you still want to use vsock? > When using virtio-net, users looking for inter-VM communication are required to setup bridges, TAPs, allocate IP addresses or setup DNS, etc... and then finally when you have a network, you can open a socket on an IP address and port. This is the configuration that vsock avoids. For vsock, we just need a CID and a port, but no network configuration. This benefit still exists after introducing a netdev to vsock. The major added benefit is that when you have many different vsock flows in parallel and you are observing issues like starvation and tail latency that are caused by pure FIFO queuing, now there is a mechanism to fix those issues. You might recall such an issue discussed here[1]. [1]: https://gitlab.com/vsock/vsock/-/issues/1 > > In summary, this series introduces these major changes to vsock: > > > > - virtio vsock supports datagrams > > - virtio vsock uses struct sk_buff instead of virtio_vsock_pkt > > - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb, > > which applies the throttling threshold sk_sndbuf. > > - The vsock socket layer supports returning errors other than -ENOMEM. > > - This is used to return -EAGAIN when the sk_sndbuf threshold is > > reached. > > - virtio vsock uses a net_device, through which qdisc may be used. > > - qdisc allows scheduling policies to be applied to vsock flows. > > - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is, > > it may avoid datagrams from flooding out stream flows. The benefit > > to this is that additional virtqueues are not needed for datagrams. > > - The net_device and qdisc is bypassed by simply setting the > > net_device state to DOWN. > > > > [1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/ > > > > Bobby Eshleman (5): > > vsock: replace virtio_vsock_pkt with sk_buff > > vsock: return errors other than -ENOMEM to socket > > vsock: add netdev to vhost/virtio vsock > > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit > > virtio/vsock: add support for dgram > > > > Jiang Wang (1): > > vsock_test: add tests for vsock dgram > > > > drivers/vhost/vsock.c | 238 ++++---- > > include/linux/virtio_vsock.h | 73 ++- > > include/net/af_vsock.h | 2 + > > include/uapi/linux/virtio_vsock.h | 2 + > > net/vmw_vsock/af_vsock.c | 30 +- > > net/vmw_vsock/hyperv_transport.c | 2 +- > > net/vmw_vsock/virtio_transport.c | 237 +++++--- > > net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- > > net/vmw_vsock/vmci_transport.c | 9 +- > > net/vmw_vsock/vsock_loopback.c | 51 +- > > tools/testing/vsock/util.c | 105 ++++ > > tools/testing/vsock/util.h | 4 + > > tools/testing/vsock/vsock_test.c | 195 ++++++ > > 13 files changed, 1176 insertions(+), 543 deletions(-) > > > > -- > > 2.35.1 >
On Tue, Aug 16, 2022 at 09:42:51AM +0000, Bobby Eshleman wrote: > > The basic question to answer then is this: with a net device qdisc > > etc in the picture, how is this different from virtio net then? > > Why do you still want to use vsock? > > > > When using virtio-net, users looking for inter-VM communication are > required to setup bridges, TAPs, allocate IP addresses or setup DNS, > etc... and then finally when you have a network, you can open a socket > on an IP address and port. This is the configuration that vsock avoids. > For vsock, we just need a CID and a port, but no network configuration. Surely when you mention DNS you are going overboard? vsock doesn't remove the need for DNS as much as it does not support it. -- MST
On Wed, Aug 17, 2022 at 01:02:52PM -0400, Michael S. Tsirkin wrote: > On Tue, Aug 16, 2022 at 09:42:51AM +0000, Bobby Eshleman wrote: > > > The basic question to answer then is this: with a net device qdisc > > > etc in the picture, how is this different from virtio net then? > > > Why do you still want to use vsock? > > > > > > > When using virtio-net, users looking for inter-VM communication are > > required to setup bridges, TAPs, allocate IP addresses or setup DNS, > > etc... and then finally when you have a network, you can open a socket > > on an IP address and port. This is the configuration that vsock avoids. > > For vsock, we just need a CID and a port, but no network configuration. > > Surely when you mention DNS you are going overboard? vsock doesn't > remove the need for DNS as much as it does not support it. > Oops, s/DNS/dhcp.
On Tue, Aug 16, 2022 at 11:08:26AM +0000, Bobby Eshleman wrote: > On Wed, Aug 17, 2022 at 01:02:52PM -0400, Michael S. Tsirkin wrote: > > On Tue, Aug 16, 2022 at 09:42:51AM +0000, Bobby Eshleman wrote: > > > > The basic question to answer then is this: with a net device qdisc > > > > etc in the picture, how is this different from virtio net then? > > > > Why do you still want to use vsock? > > > > > > > > > > When using virtio-net, users looking for inter-VM communication are > > > required to setup bridges, TAPs, allocate IP addresses or setup DNS, > > > etc... and then finally when you have a network, you can open a socket > > > on an IP address and port. This is the configuration that vsock avoids. > > > For vsock, we just need a CID and a port, but no network configuration. > > > > Surely when you mention DNS you are going overboard? vsock doesn't > > remove the need for DNS as much as it does not support it. > > > > Oops, s/DNS/dhcp. That too. -- MST
On Wed, Aug 17, 2022 at 01:53:32PM -0400, Michael S. Tsirkin wrote: > On Tue, Aug 16, 2022 at 11:08:26AM +0000, Bobby Eshleman wrote: > > On Wed, Aug 17, 2022 at 01:02:52PM -0400, Michael S. Tsirkin wrote: > > > On Tue, Aug 16, 2022 at 09:42:51AM +0000, Bobby Eshleman wrote: > > > > > The basic question to answer then is this: with a net device qdisc > > > > > etc in the picture, how is this different from virtio net then? > > > > > Why do you still want to use vsock? > > > > > > > > > > > > > When using virtio-net, users looking for inter-VM communication are > > > > required to setup bridges, TAPs, allocate IP addresses or setup DNS, > > > > etc... and then finally when you have a network, you can open a socket > > > > on an IP address and port. This is the configuration that vsock avoids. > > > > For vsock, we just need a CID and a port, but no network configuration. > > > > > > Surely when you mention DNS you are going overboard? vsock doesn't > > > remove the need for DNS as much as it does not support it. > > > > > > > Oops, s/DNS/dhcp. > > That too. > Sure, setting up dhcp would be overboard for just inter-VM comms. It is fair to mention that vsock CIDs also need to be managed / allocated somehow.
CC'ing virtio-dev@lists.oasis-open.org On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: > Hey everybody, > > This series introduces datagrams, packet scheduling, and sk_buff usage > to virtio vsock. > > The usage of struct sk_buff benefits users by a) preparing vsock to use > other related systems that require sk_buff, such as sockmap and qdisc, > b) supporting basic congestion control via sock_alloc_send_skb, and c) > reducing copying when delivering packets to TAP. > > The socket layer no longer forces errors to be -ENOMEM, as typically > userspace expects -EAGAIN when the sk_sndbuf threshold is reached and > messages are being sent with option MSG_DONTWAIT. > > The datagram work is based off previous patches by Jiang Wang[1]. > > The introduction of datagrams creates a transport layer fairness issue > where datagrams may freely starve streams of queue access. This happens > because, unlike streams, datagrams lack the transactions necessary for > calculating credits and throttling. > > Previous proposals introduce changes to the spec to add an additional > virtqueue pair for datagrams[1]. Although this solution works, using > Linux's qdisc for packet scheduling leverages already existing systems, > avoids the need to change the virtio specification, and gives additional > capabilities. The usage of SFQ or fq_codel, for example, may solve the > transport layer starvation problem. It is easy to imagine other use > cases as well. For example, services of varying importance may be > assigned different priorities, and qdisc will apply appropriate > priority-based scheduling. By default, the system default pfifo qdisc is > used. The qdisc may be bypassed and legacy queuing is resumed by simply > setting the virtio-vsock%d network device to state DOWN. This technique > still allows vsock to work with zero-configuration. > > In summary, this series introduces these major changes to vsock: > > - virtio vsock supports datagrams > - virtio vsock uses struct sk_buff instead of virtio_vsock_pkt > - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb, > which applies the throttling threshold sk_sndbuf. > - The vsock socket layer supports returning errors other than -ENOMEM. > - This is used to return -EAGAIN when the sk_sndbuf threshold is > reached. > - virtio vsock uses a net_device, through which qdisc may be used. > - qdisc allows scheduling policies to be applied to vsock flows. > - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is, > it may avoid datagrams from flooding out stream flows. The benefit > to this is that additional virtqueues are not needed for datagrams. > - The net_device and qdisc is bypassed by simply setting the > net_device state to DOWN. > > [1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/ > > Bobby Eshleman (5): > vsock: replace virtio_vsock_pkt with sk_buff > vsock: return errors other than -ENOMEM to socket > vsock: add netdev to vhost/virtio vsock > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit > virtio/vsock: add support for dgram > > Jiang Wang (1): > vsock_test: add tests for vsock dgram > > drivers/vhost/vsock.c | 238 ++++---- > include/linux/virtio_vsock.h | 73 ++- > include/net/af_vsock.h | 2 + > include/uapi/linux/virtio_vsock.h | 2 + > net/vmw_vsock/af_vsock.c | 30 +- > net/vmw_vsock/hyperv_transport.c | 2 +- > net/vmw_vsock/virtio_transport.c | 237 +++++--- > net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- > net/vmw_vsock/vmci_transport.c | 9 +- > net/vmw_vsock/vsock_loopback.c | 51 +- > tools/testing/vsock/util.c | 105 ++++ > tools/testing/vsock/util.h | 4 + > tools/testing/vsock/vsock_test.c | 195 ++++++ > 13 files changed, 1176 insertions(+), 543 deletions(-) > > -- > 2.35.1 >
On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote: > Hey everybody, > > This series introduces datagrams, packet scheduling, and sk_buff usage > to virtio vsock. > > The usage of struct sk_buff benefits users by a) preparing vsock to use > other related systems that require sk_buff, such as sockmap and qdisc, > b) supporting basic congestion control via sock_alloc_send_skb, and c) > reducing copying when delivering packets to TAP. > > The socket layer no longer forces errors to be -ENOMEM, as typically > userspace expects -EAGAIN when the sk_sndbuf threshold is reached and > messages are being sent with option MSG_DONTWAIT. > > The datagram work is based off previous patches by Jiang Wang[1]. > > The introduction of datagrams creates a transport layer fairness issue > where datagrams may freely starve streams of queue access. This happens > because, unlike streams, datagrams lack the transactions necessary for > calculating credits and throttling. > > Previous proposals introduce changes to the spec to add an additional > virtqueue pair for datagrams[1]. Although this solution works, using > Linux's qdisc for packet scheduling leverages already existing systems, > avoids the need to change the virtio specification, and gives additional > capabilities. The usage of SFQ or fq_codel, for example, may solve the > transport layer starvation problem. It is easy to imagine other use > cases as well. For example, services of varying importance may be > assigned different priorities, and qdisc will apply appropriate > priority-based scheduling. By default, the system default pfifo qdisc is > used. The qdisc may be bypassed and legacy queuing is resumed by simply > setting the virtio-vsock%d network device to state DOWN. This technique > still allows vsock to work with zero-configuration. > > In summary, this series introduces these major changes to vsock: > > - virtio vsock supports datagrams > - virtio vsock uses struct sk_buff instead of virtio_vsock_pkt > - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb, > which applies the throttling threshold sk_sndbuf. > - The vsock socket layer supports returning errors other than -ENOMEM. > - This is used to return -EAGAIN when the sk_sndbuf threshold is > reached. > - virtio vsock uses a net_device, through which qdisc may be used. > - qdisc allows scheduling policies to be applied to vsock flows. > - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is, > it may avoid datagrams from flooding out stream flows. The benefit > to this is that additional virtqueues are not needed for datagrams. > - The net_device and qdisc is bypassed by simply setting the > net_device state to DOWN. > > [1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/ Given this affects the driver/device interface I'd like to ask you to please copy virtio-dev mailing list on these patches. Subscriber only I'm afraid you will need to subscribe :( > Bobby Eshleman (5): > vsock: replace virtio_vsock_pkt with sk_buff > vsock: return errors other than -ENOMEM to socket > vsock: add netdev to vhost/virtio vsock > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit > virtio/vsock: add support for dgram > > Jiang Wang (1): > vsock_test: add tests for vsock dgram > > drivers/vhost/vsock.c | 238 ++++---- > include/linux/virtio_vsock.h | 73 ++- > include/net/af_vsock.h | 2 + > include/uapi/linux/virtio_vsock.h | 2 + > net/vmw_vsock/af_vsock.c | 30 +- > net/vmw_vsock/hyperv_transport.c | 2 +- > net/vmw_vsock/virtio_transport.c | 237 +++++--- > net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++-------- > net/vmw_vsock/vmci_transport.c | 9 +- > net/vmw_vsock/vsock_loopback.c | 51 +- > tools/testing/vsock/util.c | 105 ++++ > tools/testing/vsock/util.h | 4 + > tools/testing/vsock/vsock_test.c | 195 ++++++ > 13 files changed, 1176 insertions(+), 543 deletions(-) > > -- > 2.35.1
On Mon, Aug 15, 2022 at 04:39:08PM -0400, Michael S. Tsirkin wrote: > > Given this affects the driver/device interface I'd like to > ask you to please copy virtio-dev mailing list on these patches. > Subscriber only I'm afraid you will need to subscribe :( > Ah makes sense, will do! Best, Bobby
© 2016 - 2026 Red Hat, Inc.