[v4] vsock: SOCK_LINGER rework

[PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Michal Luczaj 9 months, 2 weeks ago

There was an issue with SO_LINGER: instead of blocking until all queued
messages for the socket have been successfully sent (or the linger timeout
has been reached), close() would block until packets were handled by the
peer.

Add a check to alert on close() lingering when it should not.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 tools/testing/vsock/vsock_test.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index d0f6d253ac72d08a957cb81a3c38fcc72bec5a53..82d0bc20dfa75041f04eada1b4310be2f7c3a0c1 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1788,13 +1788,16 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
 	close(fd);
 }
 
+#define	LINGER_TIMEOUT	1	/* seconds */
+
 static void test_stream_linger_client(const struct test_opts *opts)
 {
 	struct linger optval = {
 		.l_onoff = 1,
-		.l_linger = 1
+		.l_linger = LINGER_TIMEOUT
 	};
-	int fd;
+	int bytes_unsent, fd;
+	time_t ts;
 
 	fd = vsock_stream_connect(opts->peer_cid, opts->peer_port);
 	if (fd < 0) {
@@ -1807,7 +1810,28 @@ static void test_stream_linger_client(const struct test_opts *opts)
 		exit(EXIT_FAILURE);
 	}
 
+	/* Byte left unread to expose any incorrect behaviour. */
+	send_byte(fd, 1, 0);
+
+	/* Reuse LINGER_TIMEOUT to wait for bytes_unsent == 0. */
+	timeout_begin(LINGER_TIMEOUT);
+	do {
+		if (ioctl(fd, SIOCOUTQ, &bytes_unsent) < 0) {
+			perror("ioctl(SIOCOUTQ)");
+			exit(EXIT_FAILURE);
+		}
+		timeout_check("ioctl(SIOCOUTQ) == 0");
+	} while (bytes_unsent != 0);
+	timeout_end();
+
+	ts = current_nsec();
 	close(fd);
+	if ((current_nsec() - ts) / NSEC_PER_SEC > 0) {
+		fprintf(stderr, "Unexpected lingering on close()\n");
+		exit(EXIT_FAILURE);
+	}
+
+	control_writeln("DONE");
 }
 
 static void test_stream_linger_server(const struct test_opts *opts)
@@ -1820,7 +1844,7 @@ static void test_stream_linger_server(const struct test_opts *opts)
 		exit(EXIT_FAILURE);
 	}
 
-	vsock_wait_remote_close(fd);
+	control_expectln("DONE");
 	close(fd);
 }
 

-- 
2.49.0

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Stefano Garzarella 9 months, 1 week ago

On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
>There was an issue with SO_LINGER: instead of blocking until all queued
>messages for the socket have been successfully sent (or the linger timeout
>has been reached), close() would block until packets were handled by the
>peer.

This is a new behaviour that only new kernels will follow, so I think
it is better to add a new test instead of extending a pre-existing test
that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".

The old test should continue to check the null-ptr-deref also for old
kernels, while the new test will check the new behaviour, so we can skip
the new test while testing an old kernel.

Thanks,
Stefano

>
>Add a check to alert on close() lingering when it should not.
>
>Signed-off-by: Michal Luczaj <mhal@rbox.co>
>---
> tools/testing/vsock/vsock_test.c | 30 +++++++++++++++++++++++++++---
> 1 file changed, 27 insertions(+), 3 deletions(-)
>
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index d0f6d253ac72d08a957cb81a3c38fcc72bec5a53..82d0bc20dfa75041f04eada1b4310be2f7c3a0c1 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -1788,13 +1788,16 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
> 	close(fd);
> }
>
>+#define	LINGER_TIMEOUT	1	/* seconds */
>+
> static void test_stream_linger_client(const struct test_opts *opts)
> {
> 	struct linger optval = {
> 		.l_onoff = 1,
>-		.l_linger = 1
>+		.l_linger = LINGER_TIMEOUT
> 	};
>-	int fd;
>+	int bytes_unsent, fd;
>+	time_t ts;
>
> 	fd = vsock_stream_connect(opts->peer_cid, opts->peer_port);
> 	if (fd < 0) {
>@@ -1807,7 +1810,28 @@ static void test_stream_linger_client(const struct test_opts *opts)
> 		exit(EXIT_FAILURE);
> 	}
>
>+	/* Byte left unread to expose any incorrect behaviour. */
>+	send_byte(fd, 1, 0);
>+
>+	/* Reuse LINGER_TIMEOUT to wait for bytes_unsent == 0. */
>+	timeout_begin(LINGER_TIMEOUT);
>+	do {
>+		if (ioctl(fd, SIOCOUTQ, &bytes_unsent) < 0) {
>+			perror("ioctl(SIOCOUTQ)");
>+			exit(EXIT_FAILURE);
>+		}
>+		timeout_check("ioctl(SIOCOUTQ) == 0");
>+	} while (bytes_unsent != 0);
>+	timeout_end();
>+
>+	ts = current_nsec();
> 	close(fd);
>+	if ((current_nsec() - ts) / NSEC_PER_SEC > 0) {
>+		fprintf(stderr, "Unexpected lingering on close()\n");
>+		exit(EXIT_FAILURE);
>+	}
>+
>+	control_writeln("DONE");
> }
>
> static void test_stream_linger_server(const struct test_opts *opts)
>@@ -1820,7 +1844,7 @@ static void test_stream_linger_server(const struct test_opts *opts)
> 		exit(EXIT_FAILURE);
> 	}
>
>-	vsock_wait_remote_close(fd);
>+	control_expectln("DONE");
> 	close(fd);
> }
>
>
>-- 
>2.49.0
>

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Stefano Garzarella 9 months, 1 week ago

On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
> >There was an issue with SO_LINGER: instead of blocking until all queued
> >messages for the socket have been successfully sent (or the linger timeout
> >has been reached), close() would block until packets were handled by the
> >peer.
>
> This is a new behaviour that only new kernels will follow, so I think
> it is better to add a new test instead of extending a pre-existing test
> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
>
> The old test should continue to check the null-ptr-deref also for old
> kernels, while the new test will check the new behaviour, so we can skip
> the new test while testing an old kernel.

I also saw that we don't have any test to verify that actually the
lingering is working, should we add it since we are touching it?

Thanks,
Stefano

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Michal Luczaj 9 months, 1 week ago

On 5/6/25 11:46, Stefano Garzarella wrote:
> On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
>>> There was an issue with SO_LINGER: instead of blocking until all queued
>>> messages for the socket have been successfully sent (or the linger timeout
>>> has been reached), close() would block until packets were handled by the
>>> peer.
>>
>> This is a new behaviour that only new kernels will follow, so I think
>> it is better to add a new test instead of extending a pre-existing test
>> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
>>
>> The old test should continue to check the null-ptr-deref also for old
>> kernels, while the new test will check the new behaviour, so we can skip
>> the new test while testing an old kernel.

Right, I'll split it.

> I also saw that we don't have any test to verify that actually the
> lingering is working, should we add it since we are touching it?

Yeah, I agree we should. Do you have any suggestion how this could be done
reliably?

Thanks,
Michal

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Stefano Garzarella 9 months, 1 week ago

On Wed, 7 May 2025 at 00:47, Michal Luczaj <mhal@rbox.co> wrote:
>
> On 5/6/25 11:46, Stefano Garzarella wrote:
> > On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
> >>> There was an issue with SO_LINGER: instead of blocking until all queued
> >>> messages for the socket have been successfully sent (or the linger timeout
> >>> has been reached), close() would block until packets were handled by the
> >>> peer.
> >>
> >> This is a new behaviour that only new kernels will follow, so I think
> >> it is better to add a new test instead of extending a pre-existing test
> >> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
> >>
> >> The old test should continue to check the null-ptr-deref also for old
> >> kernels, while the new test will check the new behaviour, so we can skip
> >> the new test while testing an old kernel.
>
> Right, I'll split it.
>
> > I also saw that we don't have any test to verify that actually the
> > lingering is working, should we add it since we are touching it?
>
> Yeah, I agree we should. Do you have any suggestion how this could be done
> reliably?

Can we play with SO_VM_SOCKETS_BUFFER_SIZE like in credit-update tests?

One peer can set it (e.g. to 1k), accept the connection, but without
read anything. The other peer can set the linger timeout, send more
bytes than the buffer size set by the receiver.
At this point the extra bytes should stay on the sender socket buffer,
so we can do the close() and it should time out, and we can check if
it happens.

WDYT?

Thanks,
Stefano

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Michal Luczaj 9 months ago

On 5/7/25 10:26, Stefano Garzarella wrote:
> On Wed, 7 May 2025 at 00:47, Michal Luczaj <mhal@rbox.co> wrote:
>>
>> On 5/6/25 11:46, Stefano Garzarella wrote:
>>> On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>
>>>> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
>>>>> There was an issue with SO_LINGER: instead of blocking until all queued
>>>>> messages for the socket have been successfully sent (or the linger timeout
>>>>> has been reached), close() would block until packets were handled by the
>>>>> peer.
>>>>
>>>> This is a new behaviour that only new kernels will follow, so I think
>>>> it is better to add a new test instead of extending a pre-existing test
>>>> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
>>>>
>>>> The old test should continue to check the null-ptr-deref also for old
>>>> kernels, while the new test will check the new behaviour, so we can skip
>>>> the new test while testing an old kernel.
>>
>> Right, I'll split it.
>>
>>> I also saw that we don't have any test to verify that actually the
>>> lingering is working, should we add it since we are touching it?
>>
>> Yeah, I agree we should. Do you have any suggestion how this could be done
>> reliably?
> 
> Can we play with SO_VM_SOCKETS_BUFFER_SIZE like in credit-update tests?
> 
> One peer can set it (e.g. to 1k), accept the connection, but without
> read anything. The other peer can set the linger timeout, send more
> bytes than the buffer size set by the receiver.
> At this point the extra bytes should stay on the sender socket buffer,
> so we can do the close() and it should time out, and we can check if
> it happens.
> 
> WDYT?

Haven't we discussed this approach in [1]? I've reported that I can't make
it work. But maybe I'm misunderstanding something, please see the code below.

[1]:
https://lore.kernel.org/netdev/df2d51fd-03e7-477f-8aea-938446f47864@rbox.co/

import termios, time
from socket import *

SIOCOUTQ = termios.TIOCOUTQ
VMADDR_CID_LOCAL = 1
SZ = 1024

def set_linger(s, timeout):
	optval = (timeout << 32) | 1
	s.setsockopt(SOL_SOCKET, SO_LINGER, optval)
	assert s.getsockopt(SOL_SOCKET, SO_LINGER) == optval

def set_bufsz(s, size):
	s.setsockopt(AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, size)
	assert s.getsockopt(AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE) == size

def check_lingering(addr):
	lis = socket(AF_VSOCK, SOCK_STREAM)
	lis.bind(addr)
	lis.listen()
	set_bufsz(lis, SZ)

	s = socket(AF_VSOCK, SOCK_STREAM)
	set_linger(s, 5)
	s.connect(lis.getsockname())

	p, _ = lis.accept()

	s.send(b'x')
	p.recv(1)

	print("sending...")
	s.send(b'x' * (SZ+1)) # blocks
	print("sent")

	print("closing...")
	ts = time.time()
	s.close()
	print("done in %ds" % (time.time() - ts))

check_lingering((VMADDR_CID_LOCAL, 1234))

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Stefano Garzarella 8 months, 3 weeks ago

On Mon, May 12, 2025 at 02:23:12PM +0200, Michal Luczaj wrote:
>On 5/7/25 10:26, Stefano Garzarella wrote:
>> On Wed, 7 May 2025 at 00:47, Michal Luczaj <mhal@rbox.co> wrote:
>>>
>>> On 5/6/25 11:46, Stefano Garzarella wrote:
>>>> On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>>
>>>>> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
>>>>>> There was an issue with SO_LINGER: instead of blocking until all queued
>>>>>> messages for the socket have been successfully sent (or the linger timeout
>>>>>> has been reached), close() would block until packets were handled by the
>>>>>> peer.
>>>>>
>>>>> This is a new behaviour that only new kernels will follow, so I think
>>>>> it is better to add a new test instead of extending a pre-existing test
>>>>> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
>>>>>
>>>>> The old test should continue to check the null-ptr-deref also for old
>>>>> kernels, while the new test will check the new behaviour, so we can skip
>>>>> the new test while testing an old kernel.
>>>
>>> Right, I'll split it.
>>>
>>>> I also saw that we don't have any test to verify that actually the
>>>> lingering is working, should we add it since we are touching it?
>>>
>>> Yeah, I agree we should. Do you have any suggestion how this could be done
>>> reliably?
>>
>> Can we play with SO_VM_SOCKETS_BUFFER_SIZE like in credit-update tests?
>>
>> One peer can set it (e.g. to 1k), accept the connection, but without
>> read anything. The other peer can set the linger timeout, send more
>> bytes than the buffer size set by the receiver.
>> At this point the extra bytes should stay on the sender socket buffer,
>> so we can do the close() and it should time out, and we can check if
>> it happens.
>>
>> WDYT?
>
>Haven't we discussed this approach in [1]? I've reported that I can't make

Sorry, I forgot. What was the conclusion? Why this can't work?

>it work. But maybe I'm misunderstanding something, please see the code 
>below.

What I should check in the code below?

Thanks,
Stefano

>
>[1]:
>https://lore.kernel.org/netdev/df2d51fd-03e7-477f-8aea-938446f47864@rbox.co/
>
>import termios, time
>from socket import *
>
>SIOCOUTQ = termios.TIOCOUTQ
>VMADDR_CID_LOCAL = 1
>SZ = 1024
>
>def set_linger(s, timeout):
>	optval = (timeout << 32) | 1
>	s.setsockopt(SOL_SOCKET, SO_LINGER, optval)
>	assert s.getsockopt(SOL_SOCKET, SO_LINGER) == optval
>
>def set_bufsz(s, size):
>	s.setsockopt(AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, size)
>	assert s.getsockopt(AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE) == size
>
>def check_lingering(addr):
>	lis = socket(AF_VSOCK, SOCK_STREAM)
>	lis.bind(addr)
>	lis.listen()
>	set_bufsz(lis, SZ)
>
>	s = socket(AF_VSOCK, SOCK_STREAM)
>	set_linger(s, 5)
>	s.connect(lis.getsockname())
>
>	p, _ = lis.accept()
>
>	s.send(b'x')
>	p.recv(1)
>
>	print("sending...")
>	s.send(b'x' * (SZ+1)) # blocks
>	print("sent")
>
>	print("closing...")
>	ts = time.time()
>	s.close()
>	print("done in %ds" % (time.time() - ts))
>
>check_lingering((VMADDR_CID_LOCAL, 1234))
>

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Stefano Garzarella 8 months, 3 weeks ago

On Tue, 20 May 2025 at 10:54, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Mon, May 12, 2025 at 02:23:12PM +0200, Michal Luczaj wrote:
> >On 5/7/25 10:26, Stefano Garzarella wrote:
> >> On Wed, 7 May 2025 at 00:47, Michal Luczaj <mhal@rbox.co> wrote:
> >>>
> >>> On 5/6/25 11:46, Stefano Garzarella wrote:
> >>>> On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>>>>
> >>>>> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
> >>>>>> There was an issue with SO_LINGER: instead of blocking until all queued
> >>>>>> messages for the socket have been successfully sent (or the linger timeout
> >>>>>> has been reached), close() would block until packets were handled by the
> >>>>>> peer.
> >>>>>
> >>>>> This is a new behaviour that only new kernels will follow, so I think
> >>>>> it is better to add a new test instead of extending a pre-existing test
> >>>>> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
> >>>>>
> >>>>> The old test should continue to check the null-ptr-deref also for old
> >>>>> kernels, while the new test will check the new behaviour, so we can skip
> >>>>> the new test while testing an old kernel.
> >>>
> >>> Right, I'll split it.
> >>>
> >>>> I also saw that we don't have any test to verify that actually the
> >>>> lingering is working, should we add it since we are touching it?
> >>>
> >>> Yeah, I agree we should. Do you have any suggestion how this could be done
> >>> reliably?
> >>
> >> Can we play with SO_VM_SOCKETS_BUFFER_SIZE like in credit-update tests?
> >>
> >> One peer can set it (e.g. to 1k), accept the connection, but without
> >> read anything. The other peer can set the linger timeout, send more
> >> bytes than the buffer size set by the receiver.
> >> At this point the extra bytes should stay on the sender socket buffer,
> >> so we can do the close() and it should time out, and we can check if
> >> it happens.
> >>
> >> WDYT?
> >
> >Haven't we discussed this approach in [1]? I've reported that I can't make
>
> Sorry, I forgot. What was the conclusion? Why this can't work?
>
> >it work. But maybe I'm misunderstanding something, please see the code
> >below.
>
> What I should check in the code below?

Okay, I see the send() is blocking (please next time explain better
the issue, etc.)

I don't want to block this series, so feel free to investigate that
later if we have a way to test it. If I'll find some time, I'll try to
check if we have a way.

Thanks,
Stefano

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

Posted by Stefano Garzarella 8 months, 3 weeks ago

On Tue, 20 May 2025 at 11:01, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, 20 May 2025 at 10:54, Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Mon, May 12, 2025 at 02:23:12PM +0200, Michal Luczaj wrote:
> > >On 5/7/25 10:26, Stefano Garzarella wrote:
> > >> On Wed, 7 May 2025 at 00:47, Michal Luczaj <mhal@rbox.co> wrote:
> > >>>
> > >>> On 5/6/25 11:46, Stefano Garzarella wrote:
> > >>>> On Tue, 6 May 2025 at 11:43, Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >>>>>
> > >>>>> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
> > >>>>>> There was an issue with SO_LINGER: instead of blocking until all queued
> > >>>>>> messages for the socket have been successfully sent (or the linger timeout
> > >>>>>> has been reached), close() would block until packets were handled by the
> > >>>>>> peer.
> > >>>>>
> > >>>>> This is a new behaviour that only new kernels will follow, so I think
> > >>>>> it is better to add a new test instead of extending a pre-existing test
> > >>>>> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
> > >>>>>
> > >>>>> The old test should continue to check the null-ptr-deref also for old
> > >>>>> kernels, while the new test will check the new behaviour, so we can skip
> > >>>>> the new test while testing an old kernel.
> > >>>
> > >>> Right, I'll split it.
> > >>>
> > >>>> I also saw that we don't have any test to verify that actually the
> > >>>> lingering is working, should we add it since we are touching it?
> > >>>
> > >>> Yeah, I agree we should. Do you have any suggestion how this could be done
> > >>> reliably?
> > >>
> > >> Can we play with SO_VM_SOCKETS_BUFFER_SIZE like in credit-update tests?
> > >>
> > >> One peer can set it (e.g. to 1k), accept the connection, but without
> > >> read anything. The other peer can set the linger timeout, send more
> > >> bytes than the buffer size set by the receiver.
> > >> At this point the extra bytes should stay on the sender socket buffer,
> > >> so we can do the close() and it should time out, and we can check if
> > >> it happens.
> > >>
> > >> WDYT?
> > >
> > >Haven't we discussed this approach in [1]? I've reported that I can't make
> >
> > Sorry, I forgot. What was the conclusion? Why this can't work?
> >
> > >it work. But maybe I'm misunderstanding something, please see the code
> > >below.
> >
> > What I should check in the code below?
>
> Okay, I see the send() is blocking (please next time explain better
> the issue, etc.)
>
> I don't want to block this series, so feel free to investigate that
> later if we have a way to test it. If I'll find some time, I'll try to
> check if we have a way.

I've tried to take a look, and no, there's no easy way except to
somehow slow down the receiver, but I don't think we have a reliable
way to do that, so I can't think of anything for now. Let's skip it
(I'll try to remember ;-)

Thanks,
Stefano