[v2] vsock: some fixes due to transport de-assignment

[PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.

A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Reported-by: Wongi Lee <qwerty@theori.io>
Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9acc13ab3f82..51a494b69be8 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 
 	lock_sock(sk);
 
-	/* Check if sk has been closed before lock_sock */
-	if (sock_flag(sk, SOCK_DONE)) {
+	/* Check if sk has been closed or assigned to another transport before
+	 * lock_sock (note: listener sockets are not assigned to any transport)
+	 */
+	if (sock_flag(sk, SOCK_DONE) ||
+	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
 		(void)virtio_transport_reset_no_sock(t, skb);
 		release_sock(sk);
 		sock_put(sk);
-- 
2.47.1

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Michal Luczaj 1 year ago

On 1/10/25 09:35, Stefano Garzarella wrote:
> If the socket has been de-assigned or assigned to another transport,
> we must discard any packets received because they are not expected
> and would cause issues when we access vsk->transport.
> 
> A possible scenario is described by Hyunwoo Kim in the attached link,
> where after a first connect() interrupted by a signal, and a second
> connect() failed, we can find `vsk->transport` at NULL, leading to a
> NULL pointer dereference.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Cc: stable@vger.kernel.org
> Reported-by: Hyunwoo Kim <v4bel@theori.io>
> Reported-by: Wongi Lee <qwerty@theori.io>
> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 9acc13ab3f82..51a494b69be8 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  
>  	lock_sock(sk);
>  
> -	/* Check if sk has been closed before lock_sock */
> -	if (sock_flag(sk, SOCK_DONE)) {
> +	/* Check if sk has been closed or assigned to another transport before
> +	 * lock_sock (note: listener sockets are not assigned to any transport)
> +	 */
> +	if (sock_flag(sk, SOCK_DONE) ||
> +	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
>  		(void)virtio_transport_reset_no_sock(t, skb);
>  		release_sock(sk);
>  		sock_put(sk);

I wanted to check if such special-casing for TCP_LISTEN doesn't bother
BPF/sockmap, but instead I've hit a UAF.

```
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>

/* net/vmw_vsock/af_vsock.c */
#define MAX_PORT_RETRIES	24

static void die(const char *msg)
{
	perror(msg);
	exit(-1);
}

int socket_bind(int port)
{
	struct sockaddr_vm addr = {
		.svm_family = AF_VSOCK,
		.svm_cid = VMADDR_CID_LOCAL,
		.svm_port = port,
	};
	int s;

	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	if (s < 0)
		die("socket");

	if (bind(s, (struct sockaddr *)&addr, sizeof(addr)))
		die("bind");

	return s;
}

int main(void)
{
	struct sockaddr_vm addr;
	socklen_t alen = sizeof(addr);
	int dummy, i, s;

	/* Play with `static u32 port` in __vsock_bind_connectible()
	 * to fail vsock_auto_bind() at connect #1.
	 */
	dummy = socket_bind(VMADDR_PORT_ANY);
	if (getsockname(dummy, (struct sockaddr *)&addr, &alen))
		die("getsockname");
	for (i = 0; i < MAX_PORT_RETRIES; ++i)
		socket_bind(++addr.svm_port);

	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	if (s < 0)
		die("socket s");

	if (!connect(s, (struct sockaddr *)&addr, alen))
		die("connect #1");
	perror("ok, connect #1 failed; transport set, sk in unbound list");

	addr.svm_cid = 42; /* non-existing */
	if (!connect(s, (struct sockaddr *)&addr, alen))
		die("connect #2");
	/* vsock_assign_transport
	 *   virtio_transport_release (vsk->transport->release)
	 *     virtio_transport_remove_sock
	 *       vsock_remove_sock
	 *         vsock_remove_bound
	 *           __vsock_remove_bound
	 *             sock_put(&vsk->sk)
	 */
	perror("ok, connect #2 failed; transport unset, sk ref dropped");

	addr.svm_cid = VMADDR_CID_LOCAL;
	addr.svm_port = VMADDR_PORT_ANY;
	if (bind(s, (struct sockaddr *)&addr, alen))
		die("bind s");
	/* vsock_bind
	 *   __vsock_bind
	 *     __vsock_bind_connectible
	 *       __vsock_remove_bound
	 *         sock_put(&vsk->sk)
	 */

	printf("done\n");
	return 0;
}
```

=========================
WARNING: held lock freed!
6.13.0-rc6+ #146 Not tainted
-------------------------
a.out/2057 is freeing memory ffff88816b46a200-ffff88816b46a9f7, with a lock still held there!
ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
2 locks held by a.out/2057:
 #0: ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
 #1: ffffffff86574a78 (vsock_table_lock){+...}-{3:3}, at: __vsock_bind+0x129/0x730

stack backtrace:
CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0x90
 debug_check_no_locks_freed+0x21a/0x280
 ? lockdep_hardirqs_on+0x78/0x100
 kmem_cache_free+0x142/0x590
 ? security_sk_free+0x54/0xf0
 ? __sk_destruct+0x388/0x5a0
 __sk_destruct+0x388/0x5a0
 __vsock_bind+0x5e1/0x730
 ? __pfx___vsock_bind+0x10/0x10
 ? __local_bh_enable_ip+0xab/0x140
 vsock_bind+0x97/0xe0
 ? __pfx_vsock_bind+0x10/0x10
 __sys_bind+0x154/0x1f0
 ? __pfx___sys_bind+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 __x64_sys_bind+0x6e/0xb0
 ? lockdep_hardirqs_on+0x78/0x100
 do_syscall_64+0x93/0x1b0
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a618e34b
Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
 </TASK>
==================================================================
BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730
Read of size 4 at addr ffff88816b46a74c by task a.out/2057

CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0x90
 print_report+0x174/0x4f6
 ? __virt_addr_valid+0x208/0x400
 ? __vsock_bind+0x62e/0x730
 kasan_report+0xb9/0x190
 ? __vsock_bind+0x62e/0x730
 __vsock_bind+0x62e/0x730
 ? __pfx___vsock_bind+0x10/0x10
 ? __local_bh_enable_ip+0xab/0x140
 vsock_bind+0x97/0xe0
 ? __pfx_vsock_bind+0x10/0x10
 __sys_bind+0x154/0x1f0
 ? __pfx___sys_bind+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 __x64_sys_bind+0x6e/0xb0
 ? lockdep_hardirqs_on+0x78/0x100
 do_syscall_64+0x93/0x1b0
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a618e34b
Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
 </TASK>

Allocated by task 2057:
 kasan_save_stack+0x1e/0x40
 kasan_save_track+0x10/0x30
 __kasan_slab_alloc+0x85/0x90
 kmem_cache_alloc_noprof+0x131/0x450
 sk_prot_alloc+0x5b/0x220
 sk_alloc+0x2c/0x870
 __vsock_create.constprop.0+0x2e/0xb60
 vsock_create+0xe4/0x420
 __sock_create+0x241/0x650
 __sys_socket+0xf2/0x1a0
 __x64_sys_socket+0x6e/0xb0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Freed by task 2057:
 kasan_save_stack+0x1e/0x40
 kasan_save_track+0x10/0x30
 kasan_save_free_info+0x37/0x60
 __kasan_slab_free+0x4b/0x70
 kmem_cache_free+0x1a1/0x590
 __sk_destruct+0x388/0x5a0
 __vsock_bind+0x5e1/0x730
 vsock_bind+0x97/0xe0
 __sys_bind+0x154/0x1f0
 __x64_sys_bind+0x6e/0xb0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The buggy address belongs to the object at ffff88816b46a200
 which belongs to the cache AF_VSOCK of size 2040
The buggy address is located 1356 bytes inside of
 freed 2040-byte region [ffff88816b46a200, ffff88816b46a9f8)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16b468
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff888125368401
flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
head: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
head: 0017ffffc0000003 ffffea0005ad1a01 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88816b46a600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88816b46a680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88816b46a700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
 ffff88816b46a780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88816b46a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B              6.13.0-rc6+ #146
Tainted: [B]=BAD_PAGE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xce/0x150
Code: 7b fe d8 03 01 e8 22 db ac fe 0f 0b eb b1 80 3d 6e fe d8 03 00 75 a8 48 c7 c7 e0 da 95 84 c6 05 5e fe d8 03 01 e8 02 db ac fe <0f> 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05
RSP: 0018:ffff8881285c7c90 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcd76349
R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a768
R13: ffff88816b46a280 R14: ffff88816b46a770 R15: ffffffff88901520
FS:  00007fa9a606e740(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000010f8d488 CR3: 0000000130c4a000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1ff
 ? refcount_warn_saturate+0xce/0x150
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? refcount_warn_saturate+0xce/0x150
 __vsock_bind+0x66d/0x730
 ? __pfx___vsock_bind+0x10/0x10
 ? __local_bh_enable_ip+0xab/0x140
 vsock_bind+0x97/0xe0
 ? __pfx_vsock_bind+0x10/0x10
 __sys_bind+0x154/0x1f0
 ? __pfx___sys_bind+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 __x64_sys_bind+0x6e/0xb0
 ? lockdep_hardirqs_on+0x78/0x100
 do_syscall_64+0x93/0x1b0
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x1b0
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a618e34b
Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
 </TASK>
irq event stamp: 9836
hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B   W          6.13.0-rc6+ #146
Tainted: [B]=BAD_PAGE, [W]=WARN
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0xee/0x150
Code: 5e fe d8 03 01 e8 02 db ac fe 0f 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05 3d fe d8 03 01 e8 e2 da ac fe <0f> 0b e9 6e ff ff ff 80 3d 2d fe d8 03 00 0f 85 61 ff ff ff 48 c7
RSP: 0018:ffff8881285c7b68 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcd76349
R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a770
R13: ffffffff88901520 R14: ffffffff88901520 R15: ffff888128cff640
FS:  0000000000000000(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa9a6156050 CR3: 0000000005a74000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1ff
 ? refcount_warn_saturate+0xee/0x150
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? refcount_warn_saturate+0xee/0x150
 ? refcount_warn_saturate+0xee/0x150
 vsock_remove_bound+0x187/0x1e0
 __vsock_release+0x383/0x4a0
 ? down_write+0x129/0x1c0
 vsock_release+0x90/0x120
 __sock_release+0xa3/0x250
 sock_close+0x14/0x20
 __fput+0x359/0xa80
 ? trace_irq_enable.constprop.0+0xce/0x110
 task_work_run+0x107/0x1d0
 ? __pfx_do_raw_spin_lock+0x10/0x10
 ? __pfx_task_work_run+0x10/0x10
 do_exit+0x847/0x2560
 ? __pfx_lock_release+0x10/0x10
 ? do_raw_spin_lock+0x11a/0x240
 ? __pfx_do_exit+0x10/0x10
 ? rcu_is_watching+0x11/0xb0
 ? trace_irq_enable.constprop.0+0xce/0x110
 do_group_exit+0xb8/0x250
 __x64_sys_exit_group+0x3a/0x50
 x64_sys_call+0xfec/0x14f0
 do_syscall_64+0x93/0x1b0
 ? __pfx___up_read+0x10/0x10
 ? rcu_is_watching+0x11/0xb0
 ? trace_irq_enable.constprop.0+0xce/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9a615606d
Code: Unable to access opcode bytes at 0x7fa9a6156043.
RSP: 002b:00007fff5e2d2f58 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007fa9a6259fa8 RCX: 00007fa9a615606d
RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
RBP: 00007fff5e2d2fb0 R08: 00007fff5e2d2f00 R09: 00007fff5e2d2e8f
R10: 00007fff5e2d2e10 R11: 0000000000000206 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fa9a6258680 R15: 00007fa9a6259fc0
 </TASK>
irq event stamp: 9836
hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
---[ end trace 0000000000000000 ]---

So, if I get this right:
1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
2. transport->release() calls vsock_remove_bound() without checking if sk
   was bound and moved to bound list (refcnt=1)
3. vsock_bind() assumes sk is in unbound list and before
   __vsock_insert_bound(vsock_bound_sockets()) calls
   __vsock_remove_bound() which does:
      list_del_init(&vsk->bound_table); // nop
      sock_put(&vsk->sk);               // refcnt=0

The following fixes things for me. I'm just not certain that's the only
place where transport destruction may lead to an unbound socket being
removed from the unbound list.

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 7f7de6d88096..0fe807c8c052 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
 
 	if (remove_sock) {
 		sock_set_flag(sk, SOCK_DONE);
-		virtio_transport_remove_sock(vsk);
+		if (vsock_addr_bound(&vsk->local_addr))
+			virtio_transport_remove_sock(vsk);
 	}
 }
 EXPORT_SYMBOL_GPL(virtio_transport_release);

Thanks,
Michal

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>On 1/10/25 09:35, Stefano Garzarella wrote:
>> If the socket has been de-assigned or assigned to another transport,
>> we must discard any packets received because they are not expected
>> and would cause issues when we access vsk->transport.
>>
>> A possible scenario is described by Hyunwoo Kim in the attached link,
>> where after a first connect() interrupted by a signal, and a second
>> connect() failed, we can find `vsk->transport` at NULL, leading to a
>> NULL pointer dereference.
>>
>> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>> Cc: stable@vger.kernel.org
>> Reported-by: Hyunwoo Kim <v4bel@theori.io>
>> Reported-by: Wongi Lee <qwerty@theori.io>
>> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>  net/vmw_vsock/virtio_transport_common.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 9acc13ab3f82..51a494b69be8 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>
>>  	lock_sock(sk);
>>
>> -	/* Check if sk has been closed before lock_sock */
>> -	if (sock_flag(sk, SOCK_DONE)) {
>> +	/* Check if sk has been closed or assigned to another transport before
>> +	 * lock_sock (note: listener sockets are not assigned to any transport)
>> +	 */
>> +	if (sock_flag(sk, SOCK_DONE) ||
>> +	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
>>  		(void)virtio_transport_reset_no_sock(t, skb);
>>  		release_sock(sk);
>>  		sock_put(sk);
>
>I wanted to check if such special-casing for TCP_LISTEN doesn't bother
>BPF/sockmap, but instead I've hit a UAF.
>
>```
>#include <stdio.h>
>#include <stdlib.h>
>#include <sys/socket.h>
>#include <linux/vm_sockets.h>
>
>/* net/vmw_vsock/af_vsock.c */
>#define MAX_PORT_RETRIES	24
>
>static void die(const char *msg)
>{
>	perror(msg);
>	exit(-1);
>}
>
>int socket_bind(int port)
>{
>	struct sockaddr_vm addr = {
>		.svm_family = AF_VSOCK,
>		.svm_cid = VMADDR_CID_LOCAL,
>		.svm_port = port,
>	};
>	int s;
>
>	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>	if (s < 0)
>		die("socket");
>
>	if (bind(s, (struct sockaddr *)&addr, sizeof(addr)))
>		die("bind");
>
>	return s;
>}
>
>int main(void)
>{
>	struct sockaddr_vm addr;
>	socklen_t alen = sizeof(addr);
>	int dummy, i, s;
>
>	/* Play with `static u32 port` in __vsock_bind_connectible()
>	 * to fail vsock_auto_bind() at connect #1.
>	 */
>	dummy = socket_bind(VMADDR_PORT_ANY);
>	if (getsockname(dummy, (struct sockaddr *)&addr, &alen))
>		die("getsockname");
>	for (i = 0; i < MAX_PORT_RETRIES; ++i)
>		socket_bind(++addr.svm_port);
>
>	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>	if (s < 0)
>		die("socket s");
>
>	if (!connect(s, (struct sockaddr *)&addr, alen))
>		die("connect #1");
>	perror("ok, connect #1 failed; transport set, sk in unbound list");
>
>	addr.svm_cid = 42; /* non-existing */
>	if (!connect(s, (struct sockaddr *)&addr, alen))
>		die("connect #2");
>	/* vsock_assign_transport
>	 *   virtio_transport_release (vsk->transport->release)
>	 *     virtio_transport_remove_sock
>	 *       vsock_remove_sock
>	 *         vsock_remove_bound
>	 *           __vsock_remove_bound
>	 *             sock_put(&vsk->sk)
>	 */
>	perror("ok, connect #2 failed; transport unset, sk ref dropped");
>
>	addr.svm_cid = VMADDR_CID_LOCAL;
>	addr.svm_port = VMADDR_PORT_ANY;
>	if (bind(s, (struct sockaddr *)&addr, alen))
>		die("bind s");
>	/* vsock_bind
>	 *   __vsock_bind
>	 *     __vsock_bind_connectible
>	 *       __vsock_remove_bound
>	 *         sock_put(&vsk->sk)
>	 */
>
>	printf("done\n");
>	return 0;
>}
>```
>
>=========================
>WARNING: held lock freed!
>6.13.0-rc6+ #146 Not tainted
>-------------------------
>a.out/2057 is freeing memory ffff88816b46a200-ffff88816b46a9f7, with a lock still held there!
>ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
>2 locks held by a.out/2057:
> #0: ffff88816b46a458 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_bind+0x8a/0xe0
> #1: ffffffff86574a78 (vsock_table_lock){+...}-{3:3}, at: __vsock_bind+0x129/0x730
>
>stack backtrace:
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>Call Trace:
> <TASK>
> dump_stack_lvl+0x68/0x90
> debug_check_no_locks_freed+0x21a/0x280
> ? lockdep_hardirqs_on+0x78/0x100
> kmem_cache_free+0x142/0x590
> ? security_sk_free+0x54/0xf0
> ? __sk_destruct+0x388/0x5a0
> __sk_destruct+0x388/0x5a0
> __vsock_bind+0x5e1/0x730
> ? __pfx___vsock_bind+0x10/0x10
> ? __local_bh_enable_ip+0xab/0x140
> vsock_bind+0x97/0xe0
> ? __pfx_vsock_bind+0x10/0x10
> __sys_bind+0x154/0x1f0
> ? __pfx___sys_bind+0x10/0x10
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> __x64_sys_bind+0x6e/0xb0
> ? lockdep_hardirqs_on+0x78/0x100
> do_syscall_64+0x93/0x1b0
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a618e34b
>Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
>RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
>RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
>RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
>RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
>R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
> </TASK>
>==================================================================
>BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730
>Read of size 4 at addr ffff88816b46a74c by task a.out/2057
>
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Not tainted 6.13.0-rc6+ #146
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>Call Trace:
> <TASK>
> dump_stack_lvl+0x68/0x90
> print_report+0x174/0x4f6
> ? __virt_addr_valid+0x208/0x400
> ? __vsock_bind+0x62e/0x730
> kasan_report+0xb9/0x190
> ? __vsock_bind+0x62e/0x730
> __vsock_bind+0x62e/0x730
> ? __pfx___vsock_bind+0x10/0x10
> ? __local_bh_enable_ip+0xab/0x140
> vsock_bind+0x97/0xe0
> ? __pfx_vsock_bind+0x10/0x10
> __sys_bind+0x154/0x1f0
> ? __pfx___sys_bind+0x10/0x10
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> __x64_sys_bind+0x6e/0xb0
> ? lockdep_hardirqs_on+0x78/0x100
> do_syscall_64+0x93/0x1b0
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a618e34b
>Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
>RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
>RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
>RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
>RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
>R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
> </TASK>
>
>Allocated by task 2057:
> kasan_save_stack+0x1e/0x40
> kasan_save_track+0x10/0x30
> __kasan_slab_alloc+0x85/0x90
> kmem_cache_alloc_noprof+0x131/0x450
> sk_prot_alloc+0x5b/0x220
> sk_alloc+0x2c/0x870
> __vsock_create.constprop.0+0x2e/0xb60
> vsock_create+0xe4/0x420
> __sock_create+0x241/0x650
> __sys_socket+0xf2/0x1a0
> __x64_sys_socket+0x6e/0xb0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>Freed by task 2057:
> kasan_save_stack+0x1e/0x40
> kasan_save_track+0x10/0x30
> kasan_save_free_info+0x37/0x60
> __kasan_slab_free+0x4b/0x70
> kmem_cache_free+0x1a1/0x590
> __sk_destruct+0x388/0x5a0
> __vsock_bind+0x5e1/0x730
> vsock_bind+0x97/0xe0
> __sys_bind+0x154/0x1f0
> __x64_sys_bind+0x6e/0xb0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>The buggy address belongs to the object at ffff88816b46a200
> which belongs to the cache AF_VSOCK of size 2040
>The buggy address is located 1356 bytes inside of
> freed 2040-byte region [ffff88816b46a200, ffff88816b46a9f8)
>
>The buggy address belongs to the physical page:
>page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16b468
>head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>memcg:ffff888125368401
>flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
>page_type: f5(slab)
>raw: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
>raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
>head: 0017ffffc0000040 ffff888110115540 dead000000000122 0000000000000000
>head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888125368401
>head: 0017ffffc0000003 ffffea0005ad1a01 ffffffffffffffff 0000000000000000
>head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
>page dumped because: kasan: bad access detected
>
>Memory state around the buggy address:
> ffff88816b46a600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88816b46a680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>ffff88816b46a700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                              ^
> ffff88816b46a780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88816b46a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>==================================================================
>------------[ cut here ]------------
>refcount_t: addition on 0; use-after-free.
>WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
>Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B              6.13.0-rc6+ #146
>Tainted: [B]=BAD_PAGE
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>RIP: 0010:refcount_warn_saturate+0xce/0x150
>Code: 7b fe d8 03 01 e8 22 db ac fe 0f 0b eb b1 80 3d 6e fe d8 03 00 75 a8 48 c7 c7 e0 da 95 84 c6 05 5e fe d8 03 01 e8 02 db ac fe <0f> 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05
>RSP: 0018:ffff8881285c7c90 EFLAGS: 00010282
>RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
>RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
>RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcd76349
>R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a768
>R13: ffff88816b46a280 R14: ffff88816b46a770 R15: ffffffff88901520
>FS:  00007fa9a606e740(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 0000000010f8d488 CR3: 0000000130c4a000 CR4: 0000000000752ef0
>PKRU: 55555554
>Call Trace:
> <TASK>
> ? __warn.cold+0x5f/0x1ff
> ? refcount_warn_saturate+0xce/0x150
> ? report_bug+0x1ec/0x390
> ? handle_bug+0x58/0x90
> ? exc_invalid_op+0x13/0x40
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0xce/0x150
> __vsock_bind+0x66d/0x730
> ? __pfx___vsock_bind+0x10/0x10
> ? __local_bh_enable_ip+0xab/0x140
> vsock_bind+0x97/0xe0
> ? __pfx_vsock_bind+0x10/0x10
> __sys_bind+0x154/0x1f0
> ? __pfx___sys_bind+0x10/0x10
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> __x64_sys_bind+0x6e/0xb0
> ? lockdep_hardirqs_on+0x78/0x100
> do_syscall_64+0x93/0x1b0
> ? lockdep_hardirqs_on_prepare+0x16d/0x400
> ? do_syscall_64+0x9f/0x1b0
> ? lockdep_hardirqs_on+0x78/0x100
> ? do_syscall_64+0x9f/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a618e34b
>Code: c3 66 0f 1f 44 00 00 48 8b 15 c9 9a 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb c1 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d 9a 0c 00 f7 d8 64 89 01 48
>RSP: 002b:00007fff5e2d2f88 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
>RAX: ffffffffffffffda RBX: 00007fff5e2d30e8 RCX: 00007fa9a618e34b
>RDX: 0000000000000010 RSI: 00007fff5e2d2fa0 RDI: 000000000000001c
>RBP: 00007fff5e2d2fc0 R08: 0000000010f8c010 R09: 0000000000000007
>R10: 0000000010f8c2a0 R11: 0000000000000202 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a62b0000 R15: 0000000000403e00
> </TASK>
>irq event stamp: 9836
>hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
>hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
>softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
>softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
>---[ end trace 0000000000000000 ]---
>------------[ cut here ]------------
>refcount_t: underflow; use-after-free.
>WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
>Modules linked in: 9p kvm_intel kvm 9pnet_virtio 9pnet netfs i2c_piix4 i2c_smbus zram virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
>CPU: 7 UID: 1000 PID: 2057 Comm: a.out Tainted: G    B   W          6.13.0-rc6+ #146
>Tainted: [B]=BAD_PAGE, [W]=WARN
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>RIP: 0010:refcount_warn_saturate+0xee/0x150
>Code: 5e fe d8 03 01 e8 02 db ac fe 0f 0b eb 91 80 3d 4d fe d8 03 00 75 88 48 c7 c7 40 db 95 84 c6 05 3d fe d8 03 01 e8 e2 da ac fe <0f> 0b e9 6e ff ff ff 80 3d 2d fe d8 03 00 0f 85 61 ff ff ff 48 c7
>RSP: 0018:ffff8881285c7b68 EFLAGS: 00010296
>RAX: 0000000000000000 RBX: ffff88816b46a280 RCX: 0000000000000000
>RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
>RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcd76349
>R10: ffff8885e6bb1a4b R11: 0000000000000000 R12: ffff88816b46a770
>R13: ffffffff88901520 R14: ffffffff88901520 R15: ffff888128cff640
>FS:  0000000000000000(0000) GS:ffff8885e6b80000(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 00007fa9a6156050 CR3: 0000000005a74000 CR4: 0000000000752ef0
>PKRU: 55555554
>Call Trace:
> <TASK>
> ? __warn.cold+0x5f/0x1ff
> ? refcount_warn_saturate+0xee/0x150
> ? report_bug+0x1ec/0x390
> ? handle_bug+0x58/0x90
> ? exc_invalid_op+0x13/0x40
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0xee/0x150
> ? refcount_warn_saturate+0xee/0x150
> vsock_remove_bound+0x187/0x1e0
> __vsock_release+0x383/0x4a0
> ? down_write+0x129/0x1c0
> vsock_release+0x90/0x120
> __sock_release+0xa3/0x250
> sock_close+0x14/0x20
> __fput+0x359/0xa80
> ? trace_irq_enable.constprop.0+0xce/0x110
> task_work_run+0x107/0x1d0
> ? __pfx_do_raw_spin_lock+0x10/0x10
> ? __pfx_task_work_run+0x10/0x10
> do_exit+0x847/0x2560
> ? __pfx_lock_release+0x10/0x10
> ? do_raw_spin_lock+0x11a/0x240
> ? __pfx_do_exit+0x10/0x10
> ? rcu_is_watching+0x11/0xb0
> ? trace_irq_enable.constprop.0+0xce/0x110
> do_group_exit+0xb8/0x250
> __x64_sys_exit_group+0x3a/0x50
> x64_sys_call+0xfec/0x14f0
> do_syscall_64+0x93/0x1b0
> ? __pfx___up_read+0x10/0x10
> ? rcu_is_watching+0x11/0xb0
> ? trace_irq_enable.constprop.0+0xce/0x110
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>RIP: 0033:0x7fa9a615606d
>Code: Unable to access opcode bytes at 0x7fa9a6156043.
>RSP: 002b:00007fff5e2d2f58 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
>RAX: ffffffffffffffda RBX: 00007fa9a6259fa8 RCX: 00007fa9a615606d
>RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
>RBP: 00007fff5e2d2fb0 R08: 00007fff5e2d2f00 R09: 00007fff5e2d2e8f
>R10: 00007fff5e2d2e10 R11: 0000000000000206 R12: 0000000000000001
>R13: 0000000000000000 R14: 00007fa9a6258680 R15: 00007fa9a6259fc0
> </TASK>
>irq event stamp: 9836
>hardirqs last  enabled at (9836): [<ffffffff8152121f>] __call_rcu_common.constprop.0+0x32f/0xe90
>hardirqs last disabled at (9835): [<ffffffff8152127c>] __call_rcu_common.constprop.0+0x38c/0xe90
>softirqs last  enabled at (9810): [<ffffffff84168aca>] vsock_bind+0x8a/0xe0
>softirqs last disabled at (9812): [<ffffffff84168429>] __vsock_bind+0x129/0x730
>---[ end trace 0000000000000000 ]---
>
>So, if I get this right:
>1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>2. transport->release() calls vsock_remove_bound() without checking if sk
>   was bound and moved to bound list (refcnt=1)
>3. vsock_bind() assumes sk is in unbound list and before
>   __vsock_insert_bound(vsock_bound_sockets()) calls
>   __vsock_remove_bound() which does:
>      list_del_init(&vsk->bound_table); // nop
>      sock_put(&vsk->sk);               // refcnt=0
>
>The following fixes things for me. I'm just not certain that's the only
>place where transport destruction may lead to an unbound socket being
>removed from the unbound list.
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 7f7de6d88096..0fe807c8c052 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>
> 	if (remove_sock) {
> 		sock_set_flag(sk, SOCK_DONE);
>-		virtio_transport_remove_sock(vsk);
>+		if (vsock_addr_bound(&vsk->local_addr))
>+			virtio_transport_remove_sock(vsk);

I don't get this fix, virtio_transport_remove_sock() calls
   vsock_remove_sock()
     vsock_remove_bound()
       if (__vsock_in_bound_table(vsk))
           __vsock_remove_bound(vsk);


So, should already avoid this issue, no?

Can the problem be in vsock_bind() ?

Is this issue pre-existing or introduced by this series?

I'll also investigate a bit more.

Thanks,
Stefano

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:

[...]

> >
> >So, if I get this right:
> >1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
> >2. transport->release() calls vsock_remove_bound() without checking if sk
> >   was bound and moved to bound list (refcnt=1)
> >3. vsock_bind() assumes sk is in unbound list and before
> >   __vsock_insert_bound(vsock_bound_sockets()) calls
> >   __vsock_remove_bound() which does:
> >      list_del_init(&vsk->bound_table); // nop
> >      sock_put(&vsk->sk);               // refcnt=0
> >
> >The following fixes things for me. I'm just not certain that's the only
> >place where transport destruction may lead to an unbound socket being
> >removed from the unbound list.
> >
> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >index 7f7de6d88096..0fe807c8c052 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
> >
> >       if (remove_sock) {
> >               sock_set_flag(sk, SOCK_DONE);
> >-              virtio_transport_remove_sock(vsk);
> >+              if (vsock_addr_bound(&vsk->local_addr))
> >+                      virtio_transport_remove_sock(vsk);
>
> I don't get this fix, virtio_transport_remove_sock() calls
>    vsock_remove_sock()
>      vsock_remove_bound()
>        if (__vsock_in_bound_table(vsk))
>            __vsock_remove_bound(vsk);
>
>
> So, should already avoid this issue, no?

I got it wrong, I see now what are you trying to do, but I don't think
we should skip virtio_transport_remove_sock() entirely, it also purge
the rx_queue.

>
> Can the problem be in vsock_bind() ?
>
> Is this issue pre-existing or introduced by this series?

I think this is pre-existing, can you confirm?

In that case, I'd not stop this series, and fix it in another patch/series.

Thanks,
Stefano

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Michal Luczaj 1 year ago

On 1/13/25 10:07, Stefano Garzarella wrote:
> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
> 
> [...]
> 
>>>
>>> So, if I get this right:
>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>   was bound and moved to bound list (refcnt=1)
>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>   __vsock_remove_bound() which does:
>>>      list_del_init(&vsk->bound_table); // nop
>>>      sock_put(&vsk->sk);               // refcnt=0
>>>
>>> The following fixes things for me. I'm just not certain that's the only
>>> place where transport destruction may lead to an unbound socket being
>>> removed from the unbound list.
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 7f7de6d88096..0fe807c8c052 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>
>>>       if (remove_sock) {
>>>               sock_set_flag(sk, SOCK_DONE);
>>> -              virtio_transport_remove_sock(vsk);
>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>> +                      virtio_transport_remove_sock(vsk);
>>
>> I don't get this fix, virtio_transport_remove_sock() calls
>>    vsock_remove_sock()
>>      vsock_remove_bound()
>>        if (__vsock_in_bound_table(vsk))
>>            __vsock_remove_bound(vsk);
>>
>>
>> So, should already avoid this issue, no?
> 
> I got it wrong, I see now what are you trying to do, but I don't think
> we should skip virtio_transport_remove_sock() entirely, it also purge
> the rx_queue.

Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?

>> Can the problem be in vsock_bind() ?

Well, I wouldn't say so.

>> Is this issue pre-existing or introduced by this series?
> 
> I think this is pre-existing, can you confirm?

Yup, I agree, pre-existing.

> In that case, I'd not stop this series, and fix it in another patch/series.

Yeah, sure thing.

Thanks,
Michal

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>On 1/13/25 10:07, Stefano Garzarella wrote:
>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>
>> [...]
>>
>>>>
>>>> So, if I get this right:
>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>   was bound and moved to bound list (refcnt=1)
>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>   __vsock_remove_bound() which does:
>>>>      list_del_init(&vsk->bound_table); // nop
>>>>      sock_put(&vsk->sk);               // refcnt=0
>>>>
>>>> The following fixes things for me. I'm just not certain that's the only
>>>> place where transport destruction may lead to an unbound socket being
>>>> removed from the unbound list.
>>>>
>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>
>>>>       if (remove_sock) {
>>>>               sock_set_flag(sk, SOCK_DONE);
>>>> -              virtio_transport_remove_sock(vsk);
>>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>>> +                      virtio_transport_remove_sock(vsk);
>>>
>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>    vsock_remove_sock()
>>>      vsock_remove_bound()
>>>        if (__vsock_in_bound_table(vsk))
>>>            __vsock_remove_bound(vsk);
>>>
>>>
>>> So, should already avoid this issue, no?
>>
>> I got it wrong, I see now what are you trying to do, but I don't think
>> we should skip virtio_transport_remove_sock() entirely, it also purge
>> the rx_queue.
>
>Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?

It could be.

But I see some other issues:
- we need to fix also in the other transports, since they do the same
- we need to check delayed cancel work too that call 
   virtio_transport_remove_sock()

An alternative approach, which would perhaps allow us to avoid all this, 
is to re-insert the socket in the unbound list after calling release() 
when we deassign the transport.

WDYT?

Stefano

>
>>> Can the problem be in vsock_bind() ?
>
>Well, I wouldn't say so.
>
>>> Is this issue pre-existing or introduced by this series?
>>
>> I think this is pre-existing, can you confirm?
>
>Yup, I agree, pre-existing.
>
>> In that case, I'd not stop this series, and fix it in another patch/series.
>
>Yeah, sure thing.
>
>Thanks,
>Michal
>

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Michal Luczaj 1 year ago

On 1/13/25 12:05, Stefano Garzarella wrote:
> On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>> On 1/13/25 10:07, Stefano Garzarella wrote:
>>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>>
>>> [...]
>>>
>>>>>
>>>>> So, if I get this right:
>>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>>   was bound and moved to bound list (refcnt=1)
>>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>>   __vsock_remove_bound() which does:
>>>>>      list_del_init(&vsk->bound_table); // nop
>>>>>      sock_put(&vsk->sk);               // refcnt=0
>>>>>
>>>>> The following fixes things for me. I'm just not certain that's the only
>>>>> place where transport destruction may lead to an unbound socket being
>>>>> removed from the unbound list.
>>>>>
>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>>
>>>>>       if (remove_sock) {
>>>>>               sock_set_flag(sk, SOCK_DONE);
>>>>> -              virtio_transport_remove_sock(vsk);
>>>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>>>> +                      virtio_transport_remove_sock(vsk);
>>>>
>>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>>    vsock_remove_sock()
>>>>      vsock_remove_bound()
>>>>        if (__vsock_in_bound_table(vsk))
>>>>            __vsock_remove_bound(vsk);
>>>>
>>>>
>>>> So, should already avoid this issue, no?
>>>
>>> I got it wrong, I see now what are you trying to do, but I don't think
>>> we should skip virtio_transport_remove_sock() entirely, it also purge
>>> the rx_queue.
>>
>> Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?
> 
> It could be.
> 
> But I see some other issues:
> - we need to fix also in the other transports, since they do the same

Ahh, yes, VMCI and Hyper-V would need that, too.

> - we need to check delayed cancel work too that call 
>    virtio_transport_remove_sock()

That's the "I'm just not certain" part. As with rx_queue, I though delayed
cancel can only happen for a bound socket. So, per architecture, no need to
deal with that here, right?

> An alternative approach, which would perhaps allow us to avoid all this, 
> is to re-insert the socket in the unbound list after calling release() 
> when we deassign the transport.
> 
> WDYT?

If we can't keep the old state (sk_state, transport, etc) on failed
re-connect() then reverting back to initial state sounds, uhh, like an
option :) I'm not sure how well this aligns with (user's expectations of)
good ol' socket API, but maybe that train has already left.

Another possibility would be to simply brick the socket on failed (re)connect.

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>On 1/13/25 12:05, Stefano Garzarella wrote:
>> On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>>> On 1/13/25 10:07, Stefano Garzarella wrote:
>>>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> So, if I get this right:
>>>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>>>   was bound and moved to bound list (refcnt=1)
>>>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>>>   __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>>>   __vsock_remove_bound() which does:
>>>>>>      list_del_init(&vsk->bound_table); // nop
>>>>>>      sock_put(&vsk->sk);               // refcnt=0
>>>>>>
>>>>>> The following fixes things for me. I'm just not certain that's the only
>>>>>> place where transport destruction may lead to an unbound socket being
>>>>>> removed from the unbound list.
>>>>>>
>>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>>>
>>>>>>       if (remove_sock) {
>>>>>>               sock_set_flag(sk, SOCK_DONE);
>>>>>> -              virtio_transport_remove_sock(vsk);
>>>>>> +              if (vsock_addr_bound(&vsk->local_addr))
>>>>>> +                      virtio_transport_remove_sock(vsk);
>>>>>
>>>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>>>    vsock_remove_sock()
>>>>>      vsock_remove_bound()
>>>>>        if (__vsock_in_bound_table(vsk))
>>>>>            __vsock_remove_bound(vsk);
>>>>>
>>>>>
>>>>> So, should already avoid this issue, no?
>>>>
>>>> I got it wrong, I see now what are you trying to do, but I don't think
>>>> we should skip virtio_transport_remove_sock() entirely, it also purge
>>>> the rx_queue.
>>>
>>> Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?
>>
>> It could be.
>>
>> But I see some other issues:
>> - we need to fix also in the other transports, since they do the same
>
>Ahh, yes, VMCI and Hyper-V would need that, too.
>
>> - we need to check delayed cancel work too that call
>>    virtio_transport_remove_sock()
>
>That's the "I'm just not certain" part. As with rx_queue, I though delayed
>cancel can only happen for a bound socket. So, per architecture, no need to
>deal with that here, right?

I think so.

>
>> An alternative approach, which would perhaps allow us to avoid all this,
>> is to re-insert the socket in the unbound list after calling release()
>> when we deassign the transport.
>>
>> WDYT?
>
>If we can't keep the old state (sk_state, transport, etc) on failed
>re-connect() then reverting back to initial state sounds, uhh, like an
>option :) I'm not sure how well this aligns with (user's expectations of)
>good ol' socket API, but maybe that train has already left.

We really want to behave as similar as possible with the other sockets,
like AF_INET, so I would try to continue toward that train.

>
>Another possibility would be to simply brick the socket on failed (re)connect.
>

I see, though, this is not the behavior of AF_INET for example, right?

Do you have time to investigate/fix this problem?
If not, I'll try to look into it in the next few days, maybe next week.

Thanks,
Stefano

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Michal Luczaj 1 year ago

On 1/13/25 16:01, Stefano Garzarella wrote:
> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>> ...
>>> An alternative approach, which would perhaps allow us to avoid all this,
>>> is to re-insert the socket in the unbound list after calling release()
>>> when we deassign the transport.
>>>
>>> WDYT?
>>
>> If we can't keep the old state (sk_state, transport, etc) on failed
>> re-connect() then reverting back to initial state sounds, uhh, like an
>> option :) I'm not sure how well this aligns with (user's expectations of)
>> good ol' socket API, but maybe that train has already left.
> 
> We really want to behave as similar as possible with the other sockets,
> like AF_INET, so I would try to continue toward that train.

I was worried that such connect()/transport error handling may have some
user visible side effects, but I guess I was wrong. I mean you can still
reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
that's a different issue.

I've tried your suggestion on top of this series. Passes the tests.

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index fa9d1b49599b..4718fe86689d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		vsk->transport->release(vsk);
 		vsock_deassign_transport(vsk);
 
+		vsock_addr_unbind(&vsk->local_addr);
+		vsock_addr_unbind(&vsk->remote_addr);
+		vsock_insert_unbound(vsk);
+
 		/* transport's release() and destruct() can touch some socket
 		 * state, since we are reassigning the socket to a new transport
 		 * during vsock_connect(), let's reset these fields to have a

>> Another possibility would be to simply brick the socket on failed (re)connect.
> 
> I see, though, this is not the behavior of AF_INET for example, right?

Right.

> Do you have time to investigate/fix this problem?
> If not, I'll try to look into it in the next few days, maybe next week.

I'm happy to help, but it's not like I have any better ideas.

Michal

[1]: E.g. this way:
```
from socket import *

MAX_PORT_RETRIES = 24 # net/vmw_vsock/af_vsock.c
VMADDR_CID_LOCAL = 1
VMADDR_PORT_ANY = -1
hold = []

def take_port(port):
	s = socket(AF_VSOCK, SOCK_SEQPACKET)
	s.bind((VMADDR_CID_LOCAL, port))
	hold.append(s)
	return s

s = take_port(VMADDR_PORT_ANY)
_, port = s.getsockname()
for _ in range(MAX_PORT_RETRIES):
	port += 1
	take_port(port);

s = socket(AF_VSOCK, SOCK_SEQPACKET)
err = s.connect_ex((VMADDR_CID_LOCAL, port))
assert err != 0
print("ok, connect failed; transport set")

s.bind((VMADDR_CID_LOCAL, port+1))
s.listen(16)
```

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
>On 1/13/25 16:01, Stefano Garzarella wrote:
>> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>>> ...
>>>> An alternative approach, which would perhaps allow us to avoid all this,
>>>> is to re-insert the socket in the unbound list after calling release()
>>>> when we deassign the transport.
>>>>
>>>> WDYT?
>>>
>>> If we can't keep the old state (sk_state, transport, etc) on failed
>>> re-connect() then reverting back to initial state sounds, uhh, like an
>>> option :) I'm not sure how well this aligns with (user's expectations of)
>>> good ol' socket API, but maybe that train has already left.
>>
>> We really want to behave as similar as possible with the other sockets,
>> like AF_INET, so I would try to continue toward that train.
>
>I was worried that such connect()/transport error handling may have some
>user visible side effects, but I guess I was wrong. I mean you can still
>reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
>that's a different issue.
>
>I've tried your suggestion on top of this series. Passes the tests.

Great, thanks!

>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index fa9d1b49599b..4718fe86689d 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> 		vsk->transport->release(vsk);
> 		vsock_deassign_transport(vsk);
>
>+		vsock_addr_unbind(&vsk->local_addr);
>+		vsock_addr_unbind(&vsk->remote_addr);

My only doubt is that if a user did a specific bind() before the
connect, this way we're resetting everything, is that right?

Maybe we need to look better at the release, and prevent it from
removing the socket from the lists as you suggested, maybe adding a
function in af_vsock.c that all transports can call.

Thanks,
Stefano

>+		vsock_insert_unbound(vsk);
>+
> 		/* transport's release() and destruct() can touch some socket
> 		 * state, since we are reassigning the socket to a new transport
> 		 * during vsock_connect(), let's reset these fields to have a
>
>>> Another possibility would be to simply brick the socket on failed (re)connect.
>>
>> I see, though, this is not the behavior of AF_INET for example, right?
>
>Right.
>
>> Do you have time to investigate/fix this problem?
>> If not, I'll try to look into it in the next few days, maybe next week.
>
>I'm happy to help, but it's not like I have any better ideas.
>
>Michal
>
>[1]: E.g. this way:
>```
>from socket import *
>
>MAX_PORT_RETRIES = 24 # net/vmw_vsock/af_vsock.c
>VMADDR_CID_LOCAL = 1
>VMADDR_PORT_ANY = -1
>hold = []
>
>def take_port(port):
>	s = socket(AF_VSOCK, SOCK_SEQPACKET)
>	s.bind((VMADDR_CID_LOCAL, port))
>	hold.append(s)
>	return s
>
>s = take_port(VMADDR_PORT_ANY)
>_, port = s.getsockname()
>for _ in range(MAX_PORT_RETRIES):
>	port += 1
>	take_port(port);
>
>s = socket(AF_VSOCK, SOCK_SEQPACKET)
>err = s.connect_ex((VMADDR_CID_LOCAL, port))
>assert err != 0
>print("ok, connect failed; transport set")
>
>s.bind((VMADDR_CID_LOCAL, port+1))
>s.listen(16)
>```
>

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Michal Luczaj 1 year ago

On 1/14/25 11:16, Stefano Garzarella wrote:
> On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
>> On 1/13/25 16:01, Stefano Garzarella wrote:
>>> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>>>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>>>> ...
>>>>> An alternative approach, which would perhaps allow us to avoid all this,
>>>>> is to re-insert the socket in the unbound list after calling release()
>>>>> when we deassign the transport.
>>>>>
>>>>> WDYT?
>>>>
>>>> If we can't keep the old state (sk_state, transport, etc) on failed
>>>> re-connect() then reverting back to initial state sounds, uhh, like an
>>>> option :) I'm not sure how well this aligns with (user's expectations of)
>>>> good ol' socket API, but maybe that train has already left.
>>>
>>> We really want to behave as similar as possible with the other sockets,
>>> like AF_INET, so I would try to continue toward that train.
>>
>> I was worried that such connect()/transport error handling may have some
>> user visible side effects, but I guess I was wrong. I mean you can still
>> reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
>> that's a different issue.
>>
>> I've tried your suggestion on top of this series. Passes the tests.
> 
> Great, thanks!
> 
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index fa9d1b49599b..4718fe86689d 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>> 		vsk->transport->release(vsk);
>> 		vsock_deassign_transport(vsk);
>>
>> +		vsock_addr_unbind(&vsk->local_addr);
>> +		vsock_addr_unbind(&vsk->remote_addr);
> 
> My only doubt is that if a user did a specific bind() before the
> connect, this way we're resetting everything, is that right?

That is right.

But we aren't changing much. Transport release already removes vsk from
vsock_bound_sockets. So even though vsk->local_addr is untouched (i.e.
vsock_addr_bound() returns `true`), vsk can't be picked by
vsock_find_bound_socket(). User can't bind() it again, either.

And when patched as above: bind() works as "expected", but socket is pretty
much useless, anyway. If I'm correct, the first failing connect() trips
virtio_transport_recv_connecting(), which sets `sk->sk_err`. I don't see it
being reset. Does the vsock suppose to keep sk_err state once set?

Currently only AF_VSOCK throws ConnectionResetError:
```
from socket import *

def test(family, addr):
	s = socket(family, SOCK_STREAM)
	assert s.connect_ex(addr) != 0

	lis = socket(family, SOCK_STREAM)
	lis.bind(addr)
	lis.listen()
	s.connect(addr)

	p, _ = lis.accept()
	p.send(b'x')
	assert s.recv(1) == b'x'

test(AF_INET, ('127.0.0.1', 2000))
test(AF_UNIX, '\0/tmp/foo')
test(AF_VSOCK, (1, 2000)) # VMADDR_CID_LOCAL
```

> Maybe we need to look better at the release, and prevent it from
> removing the socket from the lists as you suggested, maybe adding a
> function in af_vsock.c that all transports can call.

I'd be happy to submit a proper patch, but it would be helpful to decide
how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you
rather have that UAF plugged first?

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Stefano Garzarella 1 year ago

On Tue, Jan 14, 2025 at 05:31:08PM +0100, Michal Luczaj wrote:
>On 1/14/25 11:16, Stefano Garzarella wrote:
>> On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
>>> On 1/13/25 16:01, Stefano Garzarella wrote:
>>>> On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>>>>> On 1/13/25 12:05, Stefano Garzarella wrote:
>>>>>> ...
>>>>>> An alternative approach, which would perhaps allow us to avoid all this,
>>>>>> is to re-insert the socket in the unbound list after calling release()
>>>>>> when we deassign the transport.
>>>>>>
>>>>>> WDYT?
>>>>>
>>>>> If we can't keep the old state (sk_state, transport, etc) on failed
>>>>> re-connect() then reverting back to initial state sounds, uhh, like an
>>>>> option :) I'm not sure how well this aligns with (user's expectations of)
>>>>> good ol' socket API, but maybe that train has already left.
>>>>
>>>> We really want to behave as similar as possible with the other sockets,
>>>> like AF_INET, so I would try to continue toward that train.
>>>
>>> I was worried that such connect()/transport error handling may have some
>>> user visible side effects, but I guess I was wrong. I mean you can still
>>> reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps
>>> that's a different issue.
>>>
>>> I've tried your suggestion on top of this series. Passes the tests.
>>
>> Great, thanks!
>>
>>>
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index fa9d1b49599b..4718fe86689d 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>>> 		vsk->transport->release(vsk);
>>> 		vsock_deassign_transport(vsk);
>>>
>>> +		vsock_addr_unbind(&vsk->local_addr);
>>> +		vsock_addr_unbind(&vsk->remote_addr);
>>
>> My only doubt is that if a user did a specific bind() before the
>> connect, this way we're resetting everything, is that right?
>
>That is right.
>
>But we aren't changing much. Transport release already removes vsk from
>vsock_bound_sockets. So even though vsk->local_addr is untouched (i.e.
>vsock_addr_bound() returns `true`), vsk can't be picked by
>vsock_find_bound_socket(). User can't bind() it again, either.

Okay, I see, so maybe for now makes sense to merge your patch, to fix 
the UAF fist.

>
>And when patched as above: bind() works as "expected", but socket is pretty
>much useless, anyway. If I'm correct, the first failing connect() trips
>virtio_transport_recv_connecting(), which sets `sk->sk_err`. I don't see it
>being reset. Does the vsock suppose to keep sk_err state once set?

Nope, I think this is another thing to fix.

>
>Currently only AF_VSOCK throws ConnectionResetError:
>```
>from socket import *
>
>def test(family, addr):
>	s = socket(family, SOCK_STREAM)
>	assert s.connect_ex(addr) != 0
>
>	lis = socket(family, SOCK_STREAM)
>	lis.bind(addr)
>	lis.listen()
>	s.connect(addr)
>
>	p, _ = lis.accept()
>	p.send(b'x')
>	assert s.recv(1) == b'x'
>
>test(AF_INET, ('127.0.0.1', 2000))
>test(AF_UNIX, '\0/tmp/foo')
>test(AF_VSOCK, (1, 2000)) # VMADDR_CID_LOCAL
>```
>
>> Maybe we need to look better at the release, and prevent it from
>> removing the socket from the lists as you suggested, maybe adding a
>> function in af_vsock.c that all transports can call.
>
>I'd be happy to submit a proper patch, but it would be helpful to decide
>how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you
>rather have that UAF plugged first?
>

I'd say, let's fix the UAF first, then fix the behaviour (also in a
single series, but I prefer 2 separate patches if possible).
About that, AF_VSOCK was started with the goal of following AF_INET as
closely as possible, and the test suite should serve that as well, so if
we can solve this problem and get closer to AF_INET, possibly even
adding a dedicated test, that would be ideal!

Thank you very much for the help!
Stefano

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Michal Luczaj 1 year ago

On 1/16/25 09:57, Stefano Garzarella wrote:
> On Tue, Jan 14, 2025 at 05:31:08PM +0100, Michal Luczaj wrote:
>>> ...
>>> Maybe we need to look better at the release, and prevent it from
>>> removing the socket from the lists as you suggested, maybe adding a
>>> function in af_vsock.c that all transports can call.
>>
>> I'd be happy to submit a proper patch, but it would be helpful to decide
>> how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you
>> rather have that UAF plugged first?
>>
> 
> I'd say, let's fix the UAF first, then fix the behaviour (also in a
> single series, but I prefer 2 separate patches if possible).
> About that, AF_VSOCK was started with the goal of following AF_INET as
> closely as possible, and the test suite should serve that as well, so if
> we can solve this problem and get closer to AF_INET, possibly even
> adding a dedicated test, that would be ideal!

All right, so let's keep the binding and allow removal from (un)bound list
only on socket destruction. This is transport independent, changes are
pretty minimal and, well, keeps the binding. Mixes well with the connect()
behaviour fix.

Let me know what you think:
https://lore.kernel.org/netdev/20250117-vsock-transport-vs-autobind-v1-0-c802c803762d@rbox.co/

Thanks,
Michal

Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes

Posted by Hyunwoo Kim 1 year ago

On Fri, Jan 10, 2025 at 09:35:07AM +0100, Stefano Garzarella wrote:
> If the socket has been de-assigned or assigned to another transport,
> we must discard any packets received because they are not expected
> and would cause issues when we access vsk->transport.
> 
> A possible scenario is described by Hyunwoo Kim in the attached link,
> where after a first connect() interrupted by a signal, and a second
> connect() failed, we can find `vsk->transport` at NULL, leading to a
> NULL pointer dereference.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Cc: stable@vger.kernel.org
> Reported-by: Hyunwoo Kim <v4bel@theori.io>
> Reported-by: Wongi Lee <qwerty@theori.io>
> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 9acc13ab3f82..51a494b69be8 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  
>  	lock_sock(sk);
>  
> -	/* Check if sk has been closed before lock_sock */
> -	if (sock_flag(sk, SOCK_DONE)) {
> +	/* Check if sk has been closed or assigned to another transport before
> +	 * lock_sock (note: listener sockets are not assigned to any transport)
> +	 */
> +	if (sock_flag(sk, SOCK_DONE) ||
> +	    (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
>  		(void)virtio_transport_reset_no_sock(t, skb);
>  		release_sock(sk);
>  		sock_put(sk);
> -- 
> 2.47.1
> 

Reviewed-by: Hyunwoo Kim <v4bel@theori.io>


Regards,
Hyunwoo Kim

[PATCH net v2 1/5] vsock/virtio: discard packets if the transport changes
[PATCH net v2 2/5] vsock/bpf: return early if transport is not assigned
[PATCH net v2 3/5] vsock/virtio: cancel close work in the destructor
[PATCH net v2 4/5] vsock: reset socket state when de-assigning the transport
[PATCH net v2 5/5] vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]