include/net/af_vsock.h | 2 + net/vmw_vsock/af_vsock.c | 33 +++++++- net/vmw_vsock/hyperv_transport.c | 7 ++ net/vmw_vsock/virtio_transport_common.c | 7 +- net/vmw_vsock/vmci_transport_notify.c | 10 +-- net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +-- tools/testing/vsock/vsock_test.c | 108 +++++++++++++++++++++++++++ 7 files changed, 162 insertions(+), 17 deletions(-)
Hello,
This patchset includes some updates for SO_RCVLOWAT:
1) af_vsock:
During my experiments with zerocopy receive, i found, that in some
cases, poll() implementation violates POSIX: when socket has non-
default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
POLLRDNORM bits in 'revents' even number of bytes available to read
on socket is smaller than SO_RCVLOWAT value. In this case,user sees
POLLIN flag and then tries to read data(for example using 'read()'
call), but read call will be blocked, because SO_RCVLOWAT logic is
supported in dequeue loop in af_vsock.c. But the same time, POSIX
requires that:
"POLLIN Data other than high-priority data may be read without
blocking.
POLLRDNORM Normal data may be read without blocking."
See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
So, we have, that poll() syscall returns POLLIN, but read call will
be blocked.
Also in man page socket(7) i found that:
"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
socket as readable only if at least SO_RCVLOWAT bytes are available."
I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
this case for TCP socket, it works as POSIX required.
I've added some fixes to af_vsock.c and virtio_transport_common.c,
test is also implemented.
2) virtio/vsock:
It adds some optimization to wake ups, when new data arrived. Now,
SO_RCVLOWAT is considered before wake up sleepers who wait new data.
There is no sense, to kick waiter, when number of available bytes
in socket's queue < SO_RCVLOWAT, because if we wake up reader in
this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
exit from poll() will be "spurious". This logic is also used in TCP
sockets.
3) vmci/vsock:
Same as 2), but i'm not sure about this changes. Will be very good,
to get comments from someone who knows this code.
4) Hyper-V:
As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
support SO_RCVLOWAT, so he suggested to disable this feature for
Hyper-V.
Thank You
Arseniy Krasnov(9):
vsock: SO_RCVLOWAT transport set callback
hv_sock: disable SO_RCVLOWAT support
virtio/vsock: use 'target' in notify_poll_in callback
vmci/vsock: use 'target' in notify_poll_in callback
vsock: pass sock_rcvlowat to notify_poll_in as target
vsock: add API call for data ready
virtio/vsock: check SO_RCVLOWAT before wake up reader
vmci/vsock: check SO_RCVLOWAT before wake up reader
vsock_test: POLLIN + SO_RCVLOWAT test
include/net/af_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 33 +++++++-
net/vmw_vsock/hyperv_transport.c | 7 ++
net/vmw_vsock/virtio_transport_common.c | 7 +-
net/vmw_vsock/vmci_transport_notify.c | 10 +--
net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +--
tools/testing/vsock/vsock_test.c | 108 +++++++++++++++++++++++++++
7 files changed, 162 insertions(+), 17 deletions(-)
Changelog:
v1 -> v2:
1) Patches for VMCI transport(same as for virtio-vsock).
2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
3) Waiting logic in test was updated(sleep() -> poll()).
v2 -> v3:
1) Patches were reordered.
2) Commit message updated in 0005.
3) Check 'transport' pointer in 0001 for NULL.
v3 -> v4:
1) vsock_set_rcvlowat() logic changed. Previous version required
assigned transport and always called its 'set_rcvlowat' callback
(if present). Now, assignment is not needed.
2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
3) 0009 - small refactoring and style fixes.
4) RFC tag was removed.
--
2.25.1
On Fri, Aug 19, 2022 at 05:21:58AM +0000, Arseniy Krasnov wrote: > Hello, > > This patchset includes some updates for SO_RCVLOWAT: > > 1) af_vsock: > During my experiments with zerocopy receive, i found, that in some > cases, poll() implementation violates POSIX: when socket has non- > default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and > POLLRDNORM bits in 'revents' even number of bytes available to read > on socket is smaller than SO_RCVLOWAT value. In this case,user sees > POLLIN flag and then tries to read data(for example using 'read()' > call), but read call will be blocked, because SO_RCVLOWAT logic is > supported in dequeue loop in af_vsock.c. But the same time, POSIX > requires that: > > "POLLIN Data other than high-priority data may be read without > blocking. > POLLRDNORM Normal data may be read without blocking." > > See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293. > > So, we have, that poll() syscall returns POLLIN, but read call will > be blocked. > > Also in man page socket(7) i found that: > > "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a > socket as readable only if at least SO_RCVLOWAT bytes are available." > > I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it > uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with > this case for TCP socket, it works as POSIX required. > > I've added some fixes to af_vsock.c and virtio_transport_common.c, > test is also implemented. > > 2) virtio/vsock: > It adds some optimization to wake ups, when new data arrived. Now, > SO_RCVLOWAT is considered before wake up sleepers who wait new data. > There is no sense, to kick waiter, when number of available bytes > in socket's queue < SO_RCVLOWAT, because if we wake up reader in > this case, it will wait for SO_RCVLOWAT data anyway during dequeue, > or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such > exit from poll() will be "spurious". This logic is also used in TCP > sockets. > > 3) vmci/vsock: > Same as 2), but i'm not sure about this changes. Will be very good, > to get comments from someone who knows this code. > > 4) Hyper-V: > As Dexuan Cui mentioned, for Hyper-V transport it is difficult to > support SO_RCVLOWAT, so he suggested to disable this feature for > Hyper-V. > > Thank You Hi Arseniy, Stefano will be online again on Monday. I suggest we wait for him to review this series. If it's urgent, please let me know and I'll take a look. Thanks, Stefan > Arseniy Krasnov(9): > vsock: SO_RCVLOWAT transport set callback > hv_sock: disable SO_RCVLOWAT support > virtio/vsock: use 'target' in notify_poll_in callback > vmci/vsock: use 'target' in notify_poll_in callback > vsock: pass sock_rcvlowat to notify_poll_in as target > vsock: add API call for data ready > virtio/vsock: check SO_RCVLOWAT before wake up reader > vmci/vsock: check SO_RCVLOWAT before wake up reader > vsock_test: POLLIN + SO_RCVLOWAT test > > include/net/af_vsock.h | 2 + > net/vmw_vsock/af_vsock.c | 33 +++++++- > net/vmw_vsock/hyperv_transport.c | 7 ++ > net/vmw_vsock/virtio_transport_common.c | 7 +- > net/vmw_vsock/vmci_transport_notify.c | 10 +-- > net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +-- > tools/testing/vsock/vsock_test.c | 108 +++++++++++++++++++++++++++ > 7 files changed, 162 insertions(+), 17 deletions(-) > > Changelog: > > v1 -> v2: > 1) Patches for VMCI transport(same as for virtio-vsock). > 2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting). > 3) Waiting logic in test was updated(sleep() -> poll()). > > v2 -> v3: > 1) Patches were reordered. > 2) Commit message updated in 0005. > 3) Check 'transport' pointer in 0001 for NULL. > > v3 -> v4: > 1) vsock_set_rcvlowat() logic changed. Previous version required > assigned transport and always called its 'set_rcvlowat' callback > (if present). Now, assignment is not needed. > 2) 0003,0004,0005,0006,0007,0008 - commit messages updated. > 3) 0009 - small refactoring and style fixes. > 4) RFC tag was removed. > > -- > 2.25.1
On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote: > Stefano will be online again on Monday. I suggest we wait for him to > review this series. If it's urgent, please let me know and I'll take a > look. It was already applied, sorry about that. But please continue with review as if it wasn't. We'll just revert based on Stefano's feedback as needed.
On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote: > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote: > > Stefano will be online again on Monday. I suggest we wait for him to > > review this series. If it's urgent, please let me know and I'll take a > > look. > > It was already applied, sorry about that. But please continue with > review as if it wasn't. We'll just revert based on Stefano's feedback > as needed. Okay, no problem. Stefan
On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote: > On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote: > > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote: > > > Stefano will be online again on Monday. I suggest we wait for him to > > > review this series. If it's urgent, please let me know and I'll take a > > > look. > > > > It was already applied, sorry about that. But please continue with > > review as if it wasn't. We'll just revert based on Stefano's feedback > > as needed. > > Okay, no problem. For the records, I applied the series since it looked to me Arseniy addressed all the feedback from Stefano on the first patch (the only one still lacking an acked-by/reviewed-by tag) and I thought it would be better avoiding such delay. We are still early in this net-next cycle, I think it can be improved incrementally as needed. Paolo
On Tue, Aug 23, 2022 at 10:57:01PM +0200, Paolo Abeni wrote: >On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote: >> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote: >> > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote: >> > > Stefano will be online again on Monday. I suggest we wait for him to >> > > review this series. If it's urgent, please let me know and I'll take a >> > > look. >> > >> > It was already applied, sorry about that. But please continue with >> > review as if it wasn't. We'll just revert based on Stefano's feedback >> > as needed. Just back, and I'm fine with this version, so thanks for merging! I also run tests with virtio-vsock and everything is fine. >> >> Okay, no problem. > >For the records, I applied the series since it looked to me Arseniy >addressed all the feedback from Stefano on the first patch (the only >one still lacking an acked-by/reviewed-by tag) and I thought it would >be better avoiding such delay. Yep, from v3 I asked some changes on the first patch that Arseniy addressed in this version, and we were waiting an ack for VMCI changes (thanks Vishnu for giving it). So, it should be fine. Thanks, Stefano
On 23.08.2022 23:57, Paolo Abeni wrote: > On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote: >> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote: >>> On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote: >>>> Stefano will be online again on Monday. I suggest we wait for him to >>>> review this series. If it's urgent, please let me know and I'll take a >>>> look. Hi, sure, nothing urgent, no problem. Let's wait for Stefano! Is there something wrong with this patchset? I've replaced RFC -> net-next since Stefano reviewed all patches except 0001 and suggested to use net-next in v4. >>> >>> It was already applied, sorry about that. But please continue with >>> review as if it wasn't. We'll just revert based on Stefano's feedback >>> as needed. >> >> Okay, no problem. > > For the records, I applied the series since it looked to me Arseniy > addressed all the feedback from Stefano on the first patch (the only > one still lacking an acked-by/reviewed-by tag) and I thought it would > be better avoiding such delay. Yes, only 0001 lacks reviewed-by > > We are still early in this net-next cycle, I think it can be improved > incrementally as needed. > > Paolo > Thank You, Arseniy
Hello:
This series was applied to netdev/net-next.git (master)
by Paolo Abeni <pabeni@redhat.com>:
On Fri, 19 Aug 2022 05:21:58 +0000 you wrote:
> Hello,
>
> This patchset includes some updates for SO_RCVLOWAT:
>
> 1) af_vsock:
> During my experiments with zerocopy receive, i found, that in some
> cases, poll() implementation violates POSIX: when socket has non-
> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
> POLLRDNORM bits in 'revents' even number of bytes available to read
> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
> POLLIN flag and then tries to read data(for example using 'read()'
> call), but read call will be blocked, because SO_RCVLOWAT logic is
> supported in dequeue loop in af_vsock.c. But the same time, POSIX
> requires that:
>
> [...]
Here is the summary with links:
- [net-next,v4,1/9] vsock: SO_RCVLOWAT transport set callback
https://git.kernel.org/netdev/net-next/c/e38f22c860ed
- [net-next,v4,2/9] hv_sock: disable SO_RCVLOWAT support
https://git.kernel.org/netdev/net-next/c/24764f8d3c31
- [net-next,v4,3/9] virtio/vsock: use 'target' in notify_poll_in callback
https://git.kernel.org/netdev/net-next/c/e7a3266c9167
- [net-next,v4,4/9] vmci/vsock: use 'target' in notify_poll_in callback
https://git.kernel.org/netdev/net-next/c/a274f6ff3c5c
- [net-next,v4,5/9] vsock: pass sock_rcvlowat to notify_poll_in as target
https://git.kernel.org/netdev/net-next/c/ee0b3843a269
- [net-next,v4,6/9] vsock: add API call for data ready
https://git.kernel.org/netdev/net-next/c/f2fdcf67aceb
- [net-next,v4,7/9] virtio/vsock: check SO_RCVLOWAT before wake up reader
https://git.kernel.org/netdev/net-next/c/39f1ed33a448
- [net-next,v4,8/9] vmci/vsock: check SO_RCVLOWAT before wake up reader
https://git.kernel.org/netdev/net-next/c/e061aed99855
- [net-next,v4,9/9] vsock_test: POLLIN + SO_RCVLOWAT test
https://git.kernel.org/netdev/net-next/c/b1346338fbae
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
© 2016 - 2026 Red Hat, Inc.