[PATCH net-next v12 02/12] vsock: add netns to vsock core

Bobby Eshleman posted 12 patches 4 days, 14 hours ago
[PATCH net-next v12 02/12] vsock: add netns to vsock core
Posted by Bobby Eshleman 4 days, 14 hours ago
From: Bobby Eshleman <bobbyeshleman@meta.com>

Add netns logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).

Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".

This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
accepts the "global" or "local" mode strings.

Add netns functionality (initialization, passing to transports, procfs,
etc...) to the af_vsock socket layer. Later patches that add netns
support to transports depend on this patch.

dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
modified to take a vsk in order to perform logic on namespace modes. In
future patches, the net and net_mode will also be used for socket
lookups in these functions.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v12:
- return true in dgram_allow(), stream_allow(), and seqpacket_allow()
  only if net_mode == VSOCK_NET_MODE_GLOBAL (Stefano)
- document bind(VMADDR_CID_ANY) case in af_vsock.c (Stefano)
- change order of stream_allow() call in vmci so we can pass vsk
  to it

Changes in v10:
- add file-level comment about what happens to sockets/devices
  when the namespace mode changes (Stefano)
- change the 'if (write)' boolean in vsock_net_mode_string() to
  if (!write), this simplifies a later patch which adds "goto"
  for mutex unlocking on function exit.

Changes in v9:
- remove virtio_vsock_alloc_rx_skb() (Stefano)
- remove vsock_global_dummy_net, not needed as net=NULL +
  net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result

Changes in v7:
- hv_sock: fix hyperv build error
- explain why vhost does not use the dummy
- explain usage of __vsock_global_dummy_net
- explain why VSOCK_NET_MODE_STR_MAX is 8 characters
- use switch-case in vsock_net_mode_string()
- avoid changing transports as much as possible
- add vsock_find_{bound,connected}_socket_net()
- rename `vsock_hdr` to `sysctl_hdr`
- add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
  global mode for virtio-vsock, move skb->cb zero-ing into wrapper
- explain seqpacket_allow() change
- move net setting to __vsock_create() instead of vsock_create() so
  that child sockets also have their net assigned upon accept()

Changes in v6:
- unregister sysctl ops in vsock_exit()
- af_vsock: clarify description of CID behavior
- af_vsock: fix buf vs buffer naming, and length checking
- af_vsock: fix length checking w/ correct ctl_table->maxlen

Changes in v5:
- vsock_global_net() -> vsock_global_dummy_net()
- update comments for new uAPI
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- add prototype changes so patch remains compilable
---
 drivers/vhost/vsock.c                   |   9 +-
 include/linux/virtio_vsock.h            |   4 +-
 include/net/af_vsock.h                  |  13 +-
 net/vmw_vsock/af_vsock.c                | 272 +++++++++++++++++++++++++++++---
 net/vmw_vsock/hyperv_transport.c        |   7 +-
 net/vmw_vsock/virtio_transport.c        |   9 +-
 net/vmw_vsock/virtio_transport_common.c |   6 +-
 net/vmw_vsock/vmci_transport.c          |  26 ++-
 net/vmw_vsock/vsock_loopback.c          |   8 +-
 9 files changed, 310 insertions(+), 44 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ae01457ea2cd..83937e1d63fa 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -404,7 +404,8 @@ static bool vhost_transport_msgzerocopy_allow(void)
 	return true;
 }
 
-static bool vhost_transport_seqpacket_allow(u32 remote_cid);
+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
+					    u32 remote_cid);
 
 static struct virtio_transport vhost_transport = {
 	.transport = {
@@ -460,11 +461,15 @@ static struct virtio_transport vhost_transport = {
 	.send_pkt = vhost_transport_send_pkt,
 };
 
-static bool vhost_transport_seqpacket_allow(u32 remote_cid)
+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
+					    u32 remote_cid)
 {
 	struct vhost_vsock *vsock;
 	bool seqpacket_allow = false;
 
+	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
+		return false;
+
 	rcu_read_lock();
 	vsock = vhost_vsock_get(remote_cid);
 
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 0c67543a45c8..1845e8d4f78d 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -256,10 +256,10 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
 
 u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
 bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
-bool virtio_transport_stream_allow(u32 cid, u32 port);
+bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port);
 int virtio_transport_dgram_bind(struct vsock_sock *vsk,
 				struct sockaddr_vm *addr);
-bool virtio_transport_dgram_allow(u32 cid, u32 port);
+bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port);
 
 int virtio_transport_connect(struct vsock_sock *vsk);
 
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 9b5bdd083b6f..d10e73cd7413 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -126,7 +126,7 @@ struct vsock_transport {
 			     size_t len, int flags);
 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
 			     struct msghdr *, size_t len);
-	bool (*dgram_allow)(u32 cid, u32 port);
+	bool (*dgram_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
 
 	/* STREAM. */
 	/* TODO: stream_bind() */
@@ -138,14 +138,14 @@ struct vsock_transport {
 	s64 (*stream_has_space)(struct vsock_sock *);
 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
 	bool (*stream_is_active)(struct vsock_sock *);
-	bool (*stream_allow)(u32 cid, u32 port);
+	bool (*stream_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
 
 	/* SEQ_PACKET. */
 	ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
 				     int flags);
 	int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
 				 size_t len);
-	bool (*seqpacket_allow)(u32 remote_cid);
+	bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
 	u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
 
 	/* Notification. */
@@ -218,6 +218,13 @@ void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 					 struct sockaddr_vm *dst);
+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
+					 struct net *net,
+					 enum vsock_net_mode net_mode);
+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
+					     struct sockaddr_vm *dst,
+					     struct net *net,
+					     enum vsock_net_mode net_mode);
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(struct vsock_transport *transport,
 				     void (*fn)(struct sock *sk));
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index adcba1b7bf74..6113c22db8dc 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -83,6 +83,46 @@
  *   TCP_ESTABLISHED - connected
  *   TCP_CLOSING - disconnecting
  *   TCP_LISTEN - listening
+ *
+ * - Namespaces in vsock support two different modes configured
+ *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
+ *   Each mode defines how the namespace interacts with CIDs.
+ *   /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
+ *   and locked down by a namespace manager. The default is "global". The mode
+ *   is set per-namespace.
+ *
+ *   The modes affect the allocation and accessibility of CIDs as follows:
+ *
+ *   - global - access and allocation are all system-wide
+ *      - all CID allocation from global namespaces draw from the same
+ *        system-wide pool.
+ *      - if one global namespace has already allocated some CID, another
+ *        global namespace will not be able to allocate the same CID.
+ *      - global mode AF_VSOCK sockets can reach any VM or socket in any global
+ *        namespace, they are not contained to only their own namespace.
+ *      - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
+ *        sockets in any local mode namespace.
+ *   - local - access and allocation are contained within the namespace
+ *     - CID allocation draws only from a private pool local only to the
+ *       namespace, and does not affect the CIDs available for allocation in any
+ *       other namespace (global or local).
+ *     - VMs in a local namespace do not collide with CIDs in any other local
+ *       namespace or any global namespace. For example, if a VM in a local mode
+ *       namespace is given CID 10, then CID 10 is still available for
+ *       allocation in any other namespace, but not in the same namespace.
+ *     - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
+ *       other sockets within their own namespace.
+ *     - sockets bound to VMADDR_CID_ANY in local namespaces will never resolve
+ *       to any transport that is not compatible with local mode. There is no
+ *       error that propagates to the user (as there is for connection attempts)
+ *       because it is possible for some packet to reach this socket from
+ *       a different transport that *does* support local mode. For
+ *       example, virtio-vsock may not support local mode, but the socket
+ *       may still accept a connection from vhost-vsock which does.
+ *
+ *   - when a socket or device is initialized in a namespace with mode
+ *     global, it will stay in global mode even if the namespace later
+ *     changes to local.
  */
 
 #include <linux/compat.h>
@@ -100,6 +140,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/net.h>
+#include <linux/proc_fs.h>
 #include <linux/poll.h>
 #include <linux/random.h>
 #include <linux/skbuff.h>
@@ -111,9 +152,18 @@
 #include <linux/workqueue.h>
 #include <net/sock.h>
 #include <net/af_vsock.h>
+#include <net/netns/vsock.h>
 #include <uapi/linux/vm_sockets.h>
 #include <uapi/asm-generic/ioctls.h>
 
+#define VSOCK_NET_MODE_STR_GLOBAL "global"
+#define VSOCK_NET_MODE_STR_LOCAL "local"
+
+/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
+ * The newline is added by proc_dostring() for read operations.
+ */
+#define VSOCK_NET_MODE_STR_MAX 8
+
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
 static void vsock_sk_destruct(struct sock *sk);
 static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
@@ -235,33 +285,47 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
 	sock_put(&vsk->sk);
 }
 
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
+						  struct net *net,
+						  enum vsock_net_mode net_mode)
 {
 	struct vsock_sock *vsk;
 
 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
-		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
-			return sk_vsock(vsk);
+		struct sock *sk = sk_vsock(vsk);
+
+		if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
+		    vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
+					 net_mode))
+			return sk;
 
 		if (addr->svm_port == vsk->local_addr.svm_port &&
 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
-		     addr->svm_cid == VMADDR_CID_ANY))
-			return sk_vsock(vsk);
+		     addr->svm_cid == VMADDR_CID_ANY) &&
+		     vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
+					  net_mode))
+			return sk;
 	}
 
 	return NULL;
 }
 
-static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
-						  struct sockaddr_vm *dst)
+static struct sock *
+__vsock_find_connected_socket_net(struct sockaddr_vm *src,
+				  struct sockaddr_vm *dst, struct net *net,
+				  enum vsock_net_mode net_mode)
 {
 	struct vsock_sock *vsk;
 
 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
 			    connected_table) {
+		struct sock *sk = sk_vsock(vsk);
+
 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
-		    dst->svm_port == vsk->local_addr.svm_port) {
-			return sk_vsock(vsk);
+		    dst->svm_port == vsk->local_addr.svm_port &&
+		    vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
+					 net_mode)) {
+			return sk;
 		}
 	}
 
@@ -304,12 +368,14 @@ void vsock_remove_connected(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(vsock_remove_connected);
 
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
+					 struct net *net,
+					 enum vsock_net_mode net_mode)
 {
 	struct sock *sk;
 
 	spin_lock_bh(&vsock_table_lock);
-	sk = __vsock_find_bound_socket(addr);
+	sk = __vsock_find_bound_socket_net(addr, net, net_mode);
 	if (sk)
 		sock_hold(sk);
 
@@ -317,15 +383,23 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
 
 	return sk;
 }
+EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
+
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
+{
+	return vsock_find_bound_socket_net(addr, NULL, VSOCK_NET_MODE_GLOBAL);
+}
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
-struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
-					 struct sockaddr_vm *dst)
+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
+					     struct sockaddr_vm *dst,
+					     struct net *net,
+					     enum vsock_net_mode net_mode)
 {
 	struct sock *sk;
 
 	spin_lock_bh(&vsock_table_lock);
-	sk = __vsock_find_connected_socket(src, dst);
+	sk = __vsock_find_connected_socket_net(src, dst, net, net_mode);
 	if (sk)
 		sock_hold(sk);
 
@@ -333,6 +407,14 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 
 	return sk;
 }
+EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
+
+struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
+					 struct sockaddr_vm *dst)
+{
+	return vsock_find_connected_socket_net(src, dst,
+					       NULL, VSOCK_NET_MODE_GLOBAL);
+}
 EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
 
 void vsock_remove_sock(struct vsock_sock *vsk)
@@ -528,7 +610,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 
 	if (sk->sk_type == SOCK_SEQPACKET) {
 		if (!new_transport->seqpacket_allow ||
-		    !new_transport->seqpacket_allow(remote_cid)) {
+		    !new_transport->seqpacket_allow(vsk, remote_cid)) {
 			module_put(new_transport->module);
 			return -ESOCKTNOSUPPORT;
 		}
@@ -676,6 +758,7 @@ static void vsock_pending_work(struct work_struct *work)
 static int __vsock_bind_connectible(struct vsock_sock *vsk,
 				    struct sockaddr_vm *addr)
 {
+	struct net *net = sock_net(sk_vsock(vsk));
 	static u32 port;
 	struct sockaddr_vm new_addr;
 
@@ -695,7 +778,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
 
 			new_addr.svm_port = port++;
 
-			if (!__vsock_find_bound_socket(&new_addr)) {
+			if (!__vsock_find_bound_socket_net(&new_addr, net,
+							   vsk->net_mode)) {
 				found = true;
 				break;
 			}
@@ -712,7 +796,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
 			return -EACCES;
 		}
 
-		if (__vsock_find_bound_socket(&new_addr))
+		if (__vsock_find_bound_socket_net(&new_addr, net,
+						  vsk->net_mode))
 			return -EADDRINUSE;
 	}
 
@@ -836,6 +921,8 @@ static struct sock *__vsock_create(struct net *net,
 		vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE;
 	}
 
+	vsk->net_mode = vsock_net_mode(net);
+
 	return sk;
 }
 
@@ -1314,7 +1401,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out;
 	}
 
-	if (!transport->dgram_allow(remote_addr->svm_cid,
+	if (!transport->dgram_allow(vsk, remote_addr->svm_cid,
 				    remote_addr->svm_port)) {
 		err = -EINVAL;
 		goto out;
@@ -1355,7 +1442,7 @@ static int vsock_dgram_connect(struct socket *sock,
 	if (err)
 		goto out;
 
-	if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
+	if (!vsk->transport->dgram_allow(vsk, remote_addr->svm_cid,
 					 remote_addr->svm_port)) {
 		err = -EINVAL;
 		goto out;
@@ -1585,7 +1672,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr_unsized *addr,
 		 * endpoints.
 		 */
 		if (!transport ||
-		    !transport->stream_allow(remote_addr->svm_cid,
+		    !transport->stream_allow(vsk, remote_addr->svm_cid,
 					     remote_addr->svm_port)) {
 			err = -ENETUNREACH;
 			goto out;
@@ -2658,6 +2745,142 @@ static struct miscdevice vsock_device = {
 	.fops		= &vsock_device_ops,
 };
 
+static int vsock_net_mode_string(const struct ctl_table *table, int write,
+				 void *buffer, size_t *lenp, loff_t *ppos)
+{
+	char data[VSOCK_NET_MODE_STR_MAX] = {0};
+	enum vsock_net_mode mode;
+	struct ctl_table tmp;
+	struct net *net;
+	int ret;
+
+	if (!table->data || !table->maxlen || !*lenp) {
+		*lenp = 0;
+		return 0;
+	}
+
+	net = current->nsproxy->net_ns;
+	tmp = *table;
+	tmp.data = data;
+
+	if (!write) {
+		const char *p;
+
+		mode = vsock_net_mode(net);
+
+		switch (mode) {
+		case VSOCK_NET_MODE_GLOBAL:
+			p = VSOCK_NET_MODE_STR_GLOBAL;
+			break;
+		case VSOCK_NET_MODE_LOCAL:
+			p = VSOCK_NET_MODE_STR_LOCAL;
+			break;
+		default:
+			WARN_ONCE(true, "netns has invalid vsock mode");
+			*lenp = 0;
+			return 0;
+		}
+
+		strscpy(data, p, sizeof(data));
+		tmp.maxlen = strlen(p);
+	}
+
+	ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	if (!write)
+		return 0;
+
+	if (*lenp >= sizeof(data))
+		return -EINVAL;
+
+	if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
+		mode = VSOCK_NET_MODE_GLOBAL;
+	else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
+		mode = VSOCK_NET_MODE_LOCAL;
+	else
+		return -EINVAL;
+
+	if (!vsock_net_write_mode(net, mode))
+		return -EPERM;
+
+	return 0;
+}
+
+static struct ctl_table vsock_table[] = {
+	{
+		.procname	= "ns_mode",
+		.data		= &init_net.vsock.mode,
+		.maxlen		= VSOCK_NET_MODE_STR_MAX,
+		.mode		= 0644,
+		.proc_handler	= vsock_net_mode_string
+	},
+};
+
+static int __net_init vsock_sysctl_register(struct net *net)
+{
+	struct ctl_table *table;
+
+	if (net_eq(net, &init_net)) {
+		table = vsock_table;
+	} else {
+		table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);
+		if (!table)
+			goto err_alloc;
+
+		table[0].data = &net->vsock.mode;
+	}
+
+	net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table,
+						       ARRAY_SIZE(vsock_table));
+	if (!net->vsock.sysctl_hdr)
+		goto err_reg;
+
+	return 0;
+
+err_reg:
+	if (!net_eq(net, &init_net))
+		kfree(table);
+err_alloc:
+	return -ENOMEM;
+}
+
+static void vsock_sysctl_unregister(struct net *net)
+{
+	const struct ctl_table *table;
+
+	table = net->vsock.sysctl_hdr->ctl_table_arg;
+	unregister_net_sysctl_table(net->vsock.sysctl_hdr);
+	if (!net_eq(net, &init_net))
+		kfree(table);
+}
+
+static void vsock_net_init(struct net *net)
+{
+	net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
+}
+
+static __net_init int vsock_sysctl_init_net(struct net *net)
+{
+	vsock_net_init(net);
+
+	if (vsock_sysctl_register(net))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static __net_exit void vsock_sysctl_exit_net(struct net *net)
+{
+	vsock_sysctl_unregister(net);
+}
+
+static struct pernet_operations vsock_sysctl_ops __net_initdata = {
+	.init = vsock_sysctl_init_net,
+	.exit = vsock_sysctl_exit_net,
+};
+
 static int __init vsock_init(void)
 {
 	int err = 0;
@@ -2685,10 +2908,18 @@ static int __init vsock_init(void)
 		goto err_unregister_proto;
 	}
 
+	if (register_pernet_subsys(&vsock_sysctl_ops)) {
+		err = -ENOMEM;
+		goto err_unregister_sock;
+	}
+
+	vsock_net_init(&init_net);
 	vsock_bpf_build_proto();
 
 	return 0;
 
+err_unregister_sock:
+	sock_unregister(AF_VSOCK);
 err_unregister_proto:
 	proto_unregister(&vsock_proto);
 err_deregister_misc:
@@ -2702,6 +2933,7 @@ static void __exit vsock_exit(void)
 	misc_deregister(&vsock_device);
 	sock_unregister(AF_VSOCK);
 	proto_unregister(&vsock_proto);
+	unregister_pernet_subsys(&vsock_sysctl_ops);
 }
 
 const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 432fcbbd14d4..b2ade188c8c7 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -570,7 +570,7 @@ static int hvs_dgram_enqueue(struct vsock_sock *vsk,
 	return -EOPNOTSUPP;
 }
 
-static bool hvs_dgram_allow(u32 cid, u32 port)
+static bool hvs_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port)
 {
 	return false;
 }
@@ -745,8 +745,11 @@ static bool hvs_stream_is_active(struct vsock_sock *vsk)
 	return hvs->chan != NULL;
 }
 
-static bool hvs_stream_allow(u32 cid, u32 port)
+static bool hvs_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port)
 {
+	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
+		return false;
+
 	if (cid == VMADDR_CID_HOST)
 		return true;
 
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 8c867023a2e5..f5123810192d 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -536,7 +536,8 @@ static bool virtio_transport_msgzerocopy_allow(void)
 	return true;
 }
 
-static bool virtio_transport_seqpacket_allow(u32 remote_cid);
+static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk,
+					     u32 remote_cid);
 
 static struct virtio_transport virtio_transport = {
 	.transport = {
@@ -593,11 +594,15 @@ static struct virtio_transport virtio_transport = {
 	.can_msgzerocopy = virtio_transport_can_msgzerocopy,
 };
 
-static bool virtio_transport_seqpacket_allow(u32 remote_cid)
+static bool
+virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
 {
 	struct virtio_vsock *vsock;
 	bool seqpacket_allow;
 
+	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
+		return false;
+
 	seqpacket_allow = false;
 	rcu_read_lock();
 	vsock = rcu_dereference(the_virtio_vsock);
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index dcc8a1d5851e..e6391eb7cc1b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1043,9 +1043,9 @@ bool virtio_transport_stream_is_active(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(virtio_transport_stream_is_active);
 
-bool virtio_transport_stream_allow(u32 cid, u32 port)
+bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port)
 {
-	return true;
+	return vsk->net_mode == VSOCK_NET_MODE_GLOBAL;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
 
@@ -1056,7 +1056,7 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
 
-bool virtio_transport_dgram_allow(u32 cid, u32 port)
+bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port)
 {
 	return false;
 }
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 7eccd6708d66..0ce44dc11708 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -646,13 +646,17 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
 	return VMCI_SUCCESS;
 }
 
-static bool vmci_transport_stream_allow(u32 cid, u32 port)
+static bool vmci_transport_stream_allow(struct vsock_sock *vsk, u32 cid,
+					u32 port)
 {
 	static const u32 non_socket_contexts[] = {
 		VMADDR_CID_LOCAL,
 	};
 	int i;
 
+	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
+		return false;
+
 	BUILD_BUG_ON(sizeof(cid) != sizeof(*non_socket_contexts));
 
 	for (i = 0; i < ARRAY_SIZE(non_socket_contexts); i++) {
@@ -682,12 +686,10 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
 	err = VMCI_SUCCESS;
 	bh_process_pkt = false;
 
-	/* Ignore incoming packets from contexts without sockets, or resources
-	 * that aren't vsock implementations.
+	/* Ignore incoming packets from resources that aren't vsock
+	 * implementations.
 	 */
-
-	if (!vmci_transport_stream_allow(dg->src.context, -1)
-	    || vmci_transport_peer_rid(dg->src.context) != dg->src.resource)
+	if (vmci_transport_peer_rid(dg->src.context) != dg->src.resource)
 		return VMCI_ERROR_NO_ACCESS;
 
 	if (VMCI_DG_SIZE(dg) < sizeof(*pkt))
@@ -749,6 +751,12 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
 		goto out;
 	}
 
+	/* Ignore incoming packets from contexts without sockets. */
+	if (!vmci_transport_stream_allow(vsk, dg->src.context, -1)) {
+		err = VMCI_ERROR_NO_ACCESS;
+		goto out;
+	}
+
 	/* We do most everything in a work queue, but let's fast path the
 	 * notification of reads and writes to help data transfer performance.
 	 * We can only do this if there is no process context code executing
@@ -1784,8 +1792,12 @@ static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
 	return err;
 }
 
-static bool vmci_transport_dgram_allow(u32 cid, u32 port)
+static bool vmci_transport_dgram_allow(struct vsock_sock *vsk, u32 cid,
+				       u32 port)
 {
+	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
+		return false;
+
 	if (cid == VMADDR_CID_HYPERVISOR) {
 		/* Registrations of PBRPC Servers do not modify VMX/Hypervisor
 		 * state and are allowed.
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index bc2ff918b315..afad27cf533a 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,7 +46,8 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
 	return 0;
 }
 
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
+static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk,
+					   u32 remote_cid);
 static bool vsock_loopback_msgzerocopy_allow(void)
 {
 	return true;
@@ -106,9 +107,10 @@ static struct virtio_transport loopback_transport = {
 	.send_pkt = vsock_loopback_send_pkt,
 };
 
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
+static bool
+vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
 {
-	return true;
+	return vsk->net_mode == VSOCK_NET_MODE_GLOBAL;
 }
 
 static void vsock_loopback_work(struct work_struct *work)

-- 
2.47.3
Re: [PATCH net-next v12 02/12] vsock: add netns to vsock core
Posted by Stefano Garzarella 4 days, 7 hours ago
On Wed, Nov 26, 2025 at 11:47:31PM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@meta.com>
>
>Add netns logic to vsock core. Additionally, modify transport hook
>prototypes to be used by later transport-specific patches (e.g.,
>*_seqpacket_allow()).
>
>Namespaces are supported primarily by changing socket lookup functions
>(e.g., vsock_find_connected_socket()) to take into account the socket
>namespace and the namespace mode before considering a candidate socket a
>"match".
>
>This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
>accepts the "global" or "local" mode strings.
>
>Add netns functionality (initialization, passing to transports, procfs,
>etc...) to the af_vsock socket layer. Later patches that add netns
>support to transports depend on this patch.
>
>dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
>modified to take a vsk in order to perform logic on namespace modes. In
>future patches, the net and net_mode will also be used for socket
>lookups in these functions.
>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>---
>Changes in v12:
>- return true in dgram_allow(), stream_allow(), and seqpacket_allow()
>  only if net_mode == VSOCK_NET_MODE_GLOBAL (Stefano)
>- document bind(VMADDR_CID_ANY) case in af_vsock.c (Stefano)
>- change order of stream_allow() call in vmci so we can pass vsk
>  to it
>
>Changes in v10:
>- add file-level comment about what happens to sockets/devices
>  when the namespace mode changes (Stefano)
>- change the 'if (write)' boolean in vsock_net_mode_string() to
>  if (!write), this simplifies a later patch which adds "goto"
>  for mutex unlocking on function exit.
>
>Changes in v9:
>- remove virtio_vsock_alloc_rx_skb() (Stefano)
>- remove vsock_global_dummy_net, not needed as net=NULL +
>  net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result
>
>Changes in v7:
>- hv_sock: fix hyperv build error
>- explain why vhost does not use the dummy
>- explain usage of __vsock_global_dummy_net
>- explain why VSOCK_NET_MODE_STR_MAX is 8 characters
>- use switch-case in vsock_net_mode_string()
>- avoid changing transports as much as possible
>- add vsock_find_{bound,connected}_socket_net()
>- rename `vsock_hdr` to `sysctl_hdr`
>- add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
>  global mode for virtio-vsock, move skb->cb zero-ing into wrapper
>- explain seqpacket_allow() change
>- move net setting to __vsock_create() instead of vsock_create() so
>  that child sockets also have their net assigned upon accept()
>
>Changes in v6:
>- unregister sysctl ops in vsock_exit()
>- af_vsock: clarify description of CID behavior
>- af_vsock: fix buf vs buffer naming, and length checking
>- af_vsock: fix length checking w/ correct ctl_table->maxlen
>
>Changes in v5:
>- vsock_global_net() -> vsock_global_dummy_net()
>- update comments for new uAPI
>- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
>- add prototype changes so patch remains compilable
>---
> drivers/vhost/vsock.c                   |   9 +-
> include/linux/virtio_vsock.h            |   4 +-
> include/net/af_vsock.h                  |  13 +-
> net/vmw_vsock/af_vsock.c                | 272 +++++++++++++++++++++++++++++---
> net/vmw_vsock/hyperv_transport.c        |   7 +-
> net/vmw_vsock/virtio_transport.c        |   9 +-
> net/vmw_vsock/virtio_transport_common.c |   6 +-
> net/vmw_vsock/vmci_transport.c          |  26 ++-
> net/vmw_vsock/vsock_loopback.c          |   8 +-
> 9 files changed, 310 insertions(+), 44 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index ae01457ea2cd..83937e1d63fa 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -404,7 +404,8 @@ static bool vhost_transport_msgzerocopy_allow(void)
> 	return true;
> }
>
>-static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
>+					    u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
> 	.transport = {
>@@ -460,11 +461,15 @@ static struct virtio_transport vhost_transport = {
> 	.send_pkt = vhost_transport_send_pkt,
> };
>
>-static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>+static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
>+					    u32 remote_cid)
> {
> 	struct vhost_vsock *vsock;
> 	bool seqpacket_allow = false;
>
>+	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
>+		return false;
>+
> 	rcu_read_lock();
> 	vsock = vhost_vsock_get(remote_cid);
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 0c67543a45c8..1845e8d4f78d 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -256,10 +256,10 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
>
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>-bool virtio_transport_stream_allow(u32 cid, u32 port);
>+bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> 				struct sockaddr_vm *addr);
>-bool virtio_transport_dgram_allow(u32 cid, u32 port);
>+bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port);
>
> int virtio_transport_connect(struct vsock_sock *vsk);
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 9b5bdd083b6f..d10e73cd7413 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -126,7 +126,7 @@ struct vsock_transport {
> 			     size_t len, int flags);
> 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> 			     struct msghdr *, size_t len);
>-	bool (*dgram_allow)(u32 cid, u32 port);
>+	bool (*dgram_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
>
> 	/* STREAM. */
> 	/* TODO: stream_bind() */
>@@ -138,14 +138,14 @@ struct vsock_transport {
> 	s64 (*stream_has_space)(struct vsock_sock *);
> 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
> 	bool (*stream_is_active)(struct vsock_sock *);
>-	bool (*stream_allow)(u32 cid, u32 port);
>+	bool (*stream_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
>
> 	/* SEQ_PACKET. */
> 	ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> 				     int flags);
> 	int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> 				 size_t len);
>-	bool (*seqpacket_allow)(u32 remote_cid);
>+	bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
> 	u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
>
> 	/* Notification. */
>@@ -218,6 +218,13 @@ void vsock_remove_connected(struct vsock_sock *vsk);
> struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> 					 struct sockaddr_vm *dst);
>+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
>+					 struct net *net,
>+					 enum vsock_net_mode net_mode);
>+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
>+					     struct sockaddr_vm *dst,
>+					     struct net *net,
>+					     enum vsock_net_mode net_mode);
> void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(struct vsock_transport *transport,
> 				     void (*fn)(struct sock *sk));
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index adcba1b7bf74..6113c22db8dc 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -83,6 +83,46 @@
>  *   TCP_ESTABLISHED - connected
>  *   TCP_CLOSING - disconnecting
>  *   TCP_LISTEN - listening
>+ *
>+ * - Namespaces in vsock support two different modes configured
>+ *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
>+ *   Each mode defines how the namespace interacts with CIDs.
>+ *   /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
>+ *   and locked down by a namespace manager. The default is "global". The mode
>+ *   is set per-namespace.
>+ *
>+ *   The modes affect the allocation and accessibility of CIDs as follows:
>+ *
>+ *   - global - access and allocation are all system-wide

nit: maybe we should mention that this mode is primarily for backward 
compatibility, since it's the way how vsock worked before netns support.

(We can fix later eventually with a followup patch)

>+ *      - all CID allocation from global namespaces draw from the same
>+ *        system-wide pool.
>+ *      - if one global namespace has already allocated some CID, another
>+ *        global namespace will not be able to allocate the same CID.
>+ *      - global mode AF_VSOCK sockets can reach any VM or socket in any global
>+ *        namespace, they are not contained to only their own namespace.
>+ *      - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
>+ *        sockets in any local mode namespace.
>+ *   - local - access and allocation are contained within the namespace
>+ *     - CID allocation draws only from a private pool local only to the
>+ *       namespace, and does not affect the CIDs available for allocation in any
>+ *       other namespace (global or local).
>+ *     - VMs in a local namespace do not collide with CIDs in any other local
>+ *       namespace or any global namespace. For example, if a VM in a local mode
>+ *       namespace is given CID 10, then CID 10 is still available for
>+ *       allocation in any other namespace, but not in the same namespace.
>+ *     - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
>+ *       other sockets within their own namespace.
>+ *     - sockets bound to VMADDR_CID_ANY in local namespaces will never resolve
>+ *       to any transport that is not compatible with local mode. There is no
>+ *       error that propagates to the user (as there is for connection attempts)
>+ *       because it is possible for some packet to reach this socket from
>+ *       a different transport that *does* support local mode. For
>+ *       example, virtio-vsock may not support local mode, but the socket
>+ *       may still accept a connection from vhost-vsock which does.
>+ *
>+ *   - when a socket or device is initialized in a namespace with mode
>+ *     global, it will stay in global mode even if the namespace later
>+ *     changes to local.
>  */
>
> #include <linux/compat.h>
>@@ -100,6 +140,7 @@
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/net.h>
>+#include <linux/proc_fs.h>
> #include <linux/poll.h>
> #include <linux/random.h>
> #include <linux/skbuff.h>
>@@ -111,9 +152,18 @@
> #include <linux/workqueue.h>
> #include <net/sock.h>
> #include <net/af_vsock.h>
>+#include <net/netns/vsock.h>
> #include <uapi/linux/vm_sockets.h>
> #include <uapi/asm-generic/ioctls.h>
>
>+#define VSOCK_NET_MODE_STR_GLOBAL "global"
>+#define VSOCK_NET_MODE_STR_LOCAL "local"
>+
>+/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
>+ * The newline is added by proc_dostring() for read operations.
>+ */
>+#define VSOCK_NET_MODE_STR_MAX 8
>+
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>@@ -235,33 +285,47 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> 	sock_put(&vsk->sk);
> }
>
>-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
>+						  struct net *net,
>+						  enum vsock_net_mode net_mode)
> {
> 	struct vsock_sock *vsk;
>
> 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>-		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
>-			return sk_vsock(vsk);
>+		struct sock *sk = sk_vsock(vsk);
>+
>+		if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
>+		    vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
>+					 net_mode))
>+			return sk;
>
> 		if (addr->svm_port == vsk->local_addr.svm_port &&
> 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
>-		     addr->svm_cid == VMADDR_CID_ANY))
>-			return sk_vsock(vsk);
>+		     addr->svm_cid == VMADDR_CID_ANY) &&
>+		     vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
>+					  net_mode))
>+			return sk;
> 	}
>
> 	return NULL;
> }
>
>-static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>-						  struct sockaddr_vm *dst)
>+static struct sock *
>+__vsock_find_connected_socket_net(struct sockaddr_vm *src,
>+				  struct sockaddr_vm *dst, struct net *net,
>+				  enum vsock_net_mode net_mode)
> {
> 	struct vsock_sock *vsk;
>
> 	list_for_each_entry(vsk, vsock_connected_s)ckets(src, dst),
> 			    connected_table) {
>+		struct sock *sk = sk_vsock(vsk);
>+
> 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
>-		    dst->svm_port == vsk->local_addr.svm_port) {
>-			return sk_vsock(vsk);
>+		    dst->svm_port == vsk->local_addr.svm_port &&
>+		    vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
>+					 net_mode)) {
>+			return sk;
> 		}
> 	}
>
>@@ -304,12 +368,14 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_remove_connected);
>
>-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>+struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
>+					 struct net *net,
>+					 enum vsock_net_mode net_mode)
> {
> 	struct sock *sk;
>
> 	spin_lock_bh(&vsock_table_lock);
>-	sk = __vsock_find_bound_socket(addr);
>+	sk = __vsock_find_bound_socket_net(addr, net, net_mode);
> 	if (sk)
> 		sock_hold(sk);
>
>@@ -317,15 +383,23 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>
> 	return sk;
> }
>+EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
>+
>+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
>+{
>+	return vsock_find_bound_socket_net(addr, NULL, VSOCK_NET_MODE_GLOBAL);

The patch LGTM, my last doubt now is if here (and in 
vsock_find_connected_socket() ) we should use `init_net`.

In practice, this is the namespace (NULL) and mode (GLOBAL) used by 
transports that do not support namespaces.

So here we are making them belong to no namespace, so they can only 
reach global ones. When any namespace, including `init_net`, switches to 
local, it can no longer be reached by transports that do not support 
local namespaces, because in practice we still do not have a way to 
associate a device (in the case of drivers) with a specific namespace.  
Right?

If I get it right, it can makes sense, but I'd like an ack from net 
maintainers to be sure we are doing the right things.

Also I think we should have a comment on top of this function to make it 
clear that should be used only by transport that doesn't support 
namespace, and the reason why we used NULL and GLOBAL. Plus a comment on 
top of this file (near where we described local vs global) to clarify 
the status of this.

That said, if next week net-next will close, I think we can send a 
follow-up patch just for those comments, so:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>+}
> EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
>
>-struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>-					 struct sockaddr_vm *dst)
>+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
>+					     struct sockaddr_vm *dst,
>+					     struct net *net,
>+					     enum vsock_net_mode net_mode)
> {
> 	struct sock *sk;
>
> 	spin_lock_bh(&vsock_table_lock);
>-	sk = __vsock_find_connected_socket(src, dst);
>+	sk = __vsock_find_connected_socket_net(src, dst, net, net_mode);
> 	if (sk)
> 		sock_hold(sk);
>
>@@ -333,6 +407,14 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>
> 	return sk;
> }
>+EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
>+
>+struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>+					 struct sockaddr_vm *dst)
>+{
>+	return vsock_find_connected_socket_net(src, dst,
>+					       NULL, VSOCK_NET_MODE_GLOBAL);
>+}
> EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
Re: [PATCH net-next v12 02/12] vsock: add netns to vsock core
Posted by Bobby Eshleman 4 days, 5 hours ago
On Thu, Nov 27, 2025 at 03:25:32PM +0100, Stefano Garzarella wrote:
> On Wed, Nov 26, 2025 at 11:47:31PM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Add netns logic to vsock core. Additionally, modify transport hook
> > prototypes to be used by later transport-specific patches (e.g.,
> > *_seqpacket_allow()).
> > 
> > Namespaces are supported primarily by changing socket lookup functions
> > (e.g., vsock_find_connected_socket()) to take into account the socket
> > namespace and the namespace mode before considering a candidate socket a
> > "match".
> > 
> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
> > accepts the "global" or "local" mode strings.
> > 
> > Add netns functionality (initialization, passing to transports, procfs,
> > etc...) to the af_vsock socket layer. Later patches that add netns
> > support to transports depend on this patch.
> > 
> > dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
> > modified to take a vsk in order to perform logic on namespace modes. In
> > future patches, the net and net_mode will also be used for socket
> > lookups in these functions.
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v12:
> > - return true in dgram_allow(), stream_allow(), and seqpacket_allow()
> >  only if net_mode == VSOCK_NET_MODE_GLOBAL (Stefano)
> > - document bind(VMADDR_CID_ANY) case in af_vsock.c (Stefano)
> > - change order of stream_allow() call in vmci so we can pass vsk
> >  to it
> > 
> > Changes in v10:
> > - add file-level comment about what happens to sockets/devices
> >  when the namespace mode changes (Stefano)
> > - change the 'if (write)' boolean in vsock_net_mode_string() to
> >  if (!write), this simplifies a later patch which adds "goto"
> >  for mutex unlocking on function exit.
> > 
> > Changes in v9:
> > - remove virtio_vsock_alloc_rx_skb() (Stefano)
> > - remove vsock_global_dummy_net, not needed as net=NULL +
> >  net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result
> > 
> > Changes in v7:
> > - hv_sock: fix hyperv build error
> > - explain why vhost does not use the dummy
> > - explain usage of __vsock_global_dummy_net
> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
> > - use switch-case in vsock_net_mode_string()
> > - avoid changing transports as much as possible
> > - add vsock_find_{bound,connected}_socket_net()
> > - rename `vsock_hdr` to `sysctl_hdr`
> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
> >  global mode for virtio-vsock, move skb->cb zero-ing into wrapper
> > - explain seqpacket_allow() change
> > - move net setting to __vsock_create() instead of vsock_create() so
> >  that child sockets also have their net assigned upon accept()
> > 
> > Changes in v6:
> > - unregister sysctl ops in vsock_exit()
> > - af_vsock: clarify description of CID behavior
> > - af_vsock: fix buf vs buffer naming, and length checking
> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
> > 
> > Changes in v5:
> > - vsock_global_net() -> vsock_global_dummy_net()
> > - update comments for new uAPI
> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
> > - add prototype changes so patch remains compilable
> > ---
> > drivers/vhost/vsock.c                   |   9 +-
> > include/linux/virtio_vsock.h            |   4 +-
> > include/net/af_vsock.h                  |  13 +-
> > net/vmw_vsock/af_vsock.c                | 272 +++++++++++++++++++++++++++++---
> > net/vmw_vsock/hyperv_transport.c        |   7 +-
> > net/vmw_vsock/virtio_transport.c        |   9 +-
> > net/vmw_vsock/virtio_transport_common.c |   6 +-
> > net/vmw_vsock/vmci_transport.c          |  26 ++-
> > net/vmw_vsock/vsock_loopback.c          |   8 +-
> > 9 files changed, 310 insertions(+), 44 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ae01457ea2cd..83937e1d63fa 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -404,7 +404,8 @@ static bool vhost_transport_msgzerocopy_allow(void)
> > 	return true;
> > }
> > 
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
> > +					    u32 remote_cid);
> > 
> > static struct virtio_transport vhost_transport = {
> > 	.transport = {
> > @@ -460,11 +461,15 @@ static struct virtio_transport vhost_transport = {
> > 	.send_pkt = vhost_transport_send_pkt,
> > };
> > 
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
> > +					    u32 remote_cid)
> > {
> > 	struct vhost_vsock *vsock;
> > 	bool seqpacket_allow = false;
> > 
> > +	if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
> > +		return false;
> > +
> > 	rcu_read_lock();
> > 	vsock = vhost_vsock_get(remote_cid);
> > 
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 0c67543a45c8..1845e8d4f78d 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -256,10 +256,10 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > 
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > -bool virtio_transport_stream_allow(u32 cid, u32 port);
> > +bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> > int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > 				struct sockaddr_vm *addr);
> > -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > +bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> > 
> > int virtio_transport_connect(struct vsock_sock *vsk);
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 9b5bdd083b6f..d10e73cd7413 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -126,7 +126,7 @@ struct vsock_transport {
> > 			     size_t len, int flags);
> > 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > 			     struct msghdr *, size_t len);
> > -	bool (*dgram_allow)(u32 cid, u32 port);
> > +	bool (*dgram_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
> > 
> > 	/* STREAM. */
> > 	/* TODO: stream_bind() */
> > @@ -138,14 +138,14 @@ struct vsock_transport {
> > 	s64 (*stream_has_space)(struct vsock_sock *);
> > 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
> > 	bool (*stream_is_active)(struct vsock_sock *);
> > -	bool (*stream_allow)(u32 cid, u32 port);
> > +	bool (*stream_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
> > 
> > 	/* SEQ_PACKET. */
> > 	ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > 				     int flags);
> > 	int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> > 				 size_t len);
> > -	bool (*seqpacket_allow)(u32 remote_cid);
> > +	bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
> > 	u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
> > 
> > 	/* Notification. */
> > @@ -218,6 +218,13 @@ void vsock_remove_connected(struct vsock_sock *vsk);
> > struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> > struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > 					 struct sockaddr_vm *dst);
> > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > +					 struct net *net,
> > +					 enum vsock_net_mode net_mode);
> > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > +					     struct sockaddr_vm *dst,
> > +					     struct net *net,
> > +					     enum vsock_net_mode net_mode);
> > void vsock_remove_sock(struct vsock_sock *vsk);
> > void vsock_for_each_connected_socket(struct vsock_transport *transport,
> > 				     void (*fn)(struct sock *sk));
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index adcba1b7bf74..6113c22db8dc 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -83,6 +83,46 @@
> >  *   TCP_ESTABLISHED - connected
> >  *   TCP_CLOSING - disconnecting
> >  *   TCP_LISTEN - listening
> > + *
> > + * - Namespaces in vsock support two different modes configured
> > + *   through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
> > + *   Each mode defines how the namespace interacts with CIDs.
> > + *   /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
> > + *   and locked down by a namespace manager. The default is "global". The mode
> > + *   is set per-namespace.
> > + *
> > + *   The modes affect the allocation and accessibility of CIDs as follows:
> > + *
> > + *   - global - access and allocation are all system-wide
> 
> nit: maybe we should mention that this mode is primarily for backward
> compatibility, since it's the way how vsock worked before netns support.
> 
> (We can fix later eventually with a followup patch)
> 
> > + *      - all CID allocation from global namespaces draw from the same
> > + *        system-wide pool.
> > + *      - if one global namespace has already allocated some CID, another
> > + *        global namespace will not be able to allocate the same CID.
> > + *      - global mode AF_VSOCK sockets can reach any VM or socket in any global
> > + *        namespace, they are not contained to only their own namespace.
> > + *      - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
> > + *        sockets in any local mode namespace.
> > + *   - local - access and allocation are contained within the namespace
> > + *     - CID allocation draws only from a private pool local only to the
> > + *       namespace, and does not affect the CIDs available for allocation in any
> > + *       other namespace (global or local).
> > + *     - VMs in a local namespace do not collide with CIDs in any other local
> > + *       namespace or any global namespace. For example, if a VM in a local mode
> > + *       namespace is given CID 10, then CID 10 is still available for
> > + *       allocation in any other namespace, but not in the same namespace.
> > + *     - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
> > + *       other sockets within their own namespace.
> > + *     - sockets bound to VMADDR_CID_ANY in local namespaces will never resolve
> > + *       to any transport that is not compatible with local mode. There is no
> > + *       error that propagates to the user (as there is for connection attempts)
> > + *       because it is possible for some packet to reach this socket from
> > + *       a different transport that *does* support local mode. For
> > + *       example, virtio-vsock may not support local mode, but the socket
> > + *       may still accept a connection from vhost-vsock which does.
> > + *
> > + *   - when a socket or device is initialized in a namespace with mode
> > + *     global, it will stay in global mode even if the namespace later
> > + *     changes to local.
> >  */
> > 
> > #include <linux/compat.h>
> > @@ -100,6 +140,7 @@
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/net.h>
> > +#include <linux/proc_fs.h>
> > #include <linux/poll.h>
> > #include <linux/random.h>
> > #include <linux/skbuff.h>
> > @@ -111,9 +152,18 @@
> > #include <linux/workqueue.h>
> > #include <net/sock.h>
> > #include <net/af_vsock.h>
> > +#include <net/netns/vsock.h>
> > #include <uapi/linux/vm_sockets.h>
> > #include <uapi/asm-generic/ioctls.h>
> > 
> > +#define VSOCK_NET_MODE_STR_GLOBAL "global"
> > +#define VSOCK_NET_MODE_STR_LOCAL "local"
> > +
> > +/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
> > + * The newline is added by proc_dostring() for read operations.
> > + */
> > +#define VSOCK_NET_MODE_STR_MAX 8
> > +
> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> > static void vsock_sk_destruct(struct sock *sk);
> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> > @@ -235,33 +285,47 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > 	sock_put(&vsk->sk);
> > }
> > 
> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > +						  struct net *net,
> > +						  enum vsock_net_mode net_mode)
> > {
> > 	struct vsock_sock *vsk;
> > 
> > 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> > -		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> > -			return sk_vsock(vsk);
> > +		struct sock *sk = sk_vsock(vsk);
> > +
> > +		if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
> > +		    vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
> > +					 net_mode))
> > +			return sk;
> > 
> > 		if (addr->svm_port == vsk->local_addr.svm_port &&
> > 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
> > -		     addr->svm_cid == VMADDR_CID_ANY))
> > -			return sk_vsock(vsk);
> > +		     addr->svm_cid == VMADDR_CID_ANY) &&
> > +		     vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
> > +					  net_mode))
> > +			return sk;
> > 	}
> > 
> > 	return NULL;
> > }
> > 
> > -static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > -						  struct sockaddr_vm *dst)
> > +static struct sock *
> > +__vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > +				  struct sockaddr_vm *dst, struct net *net,
> > +				  enum vsock_net_mode net_mode)
> > {
> > 	struct vsock_sock *vsk;
> > 
> > 	list_for_each_entry(vsk, vsock_connected_s)ckets(src, dst),
> > 			    connected_table) {
> > +		struct sock *sk = sk_vsock(vsk);
> > +
> > 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
> > -		    dst->svm_port == vsk->local_addr.svm_port) {
> > -			return sk_vsock(vsk);
> > +		    dst->svm_port == vsk->local_addr.svm_port &&
> > +		    vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
> > +					 net_mode)) {
> > +			return sk;
> > 		}
> > 	}
> > 
> > @@ -304,12 +368,14 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> > }
> > EXPORT_SYMBOL_GPL(vsock_remove_connected);
> > 
> > -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > +					 struct net *net,
> > +					 enum vsock_net_mode net_mode)
> > {
> > 	struct sock *sk;
> > 
> > 	spin_lock_bh(&vsock_table_lock);
> > -	sk = __vsock_find_bound_socket(addr);
> > +	sk = __vsock_find_bound_socket_net(addr, net, net_mode);
> > 	if (sk)
> > 		sock_hold(sk);
> > 
> > @@ -317,15 +383,23 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > 
> > 	return sk;
> > }
> > +EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
> > +
> > +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +{
> > +	return vsock_find_bound_socket_net(addr, NULL, VSOCK_NET_MODE_GLOBAL);
> 
> The patch LGTM, my last doubt now is if here (and in
> vsock_find_connected_socket() ) we should use `init_net`.
> 
> In practice, this is the namespace (NULL) and mode (GLOBAL) used by
> transports that do not support namespaces.
> 
> So here we are making them belong to no namespace, so they can only reach
> global ones. When any namespace, including `init_net`, switches to local, it
> can no longer be reached by transports that do not support local namespaces,
> because in practice we still do not have a way to associate a device (in the
> case of drivers) with a specific namespace.  Right?

Right.

> 
> If I get it right, it can makes sense, but I'd like an ack from net
> maintainers to be sure we are doing the right things.
> 
> Also I think we should have a comment on top of this function to make it
> clear that should be used only by transport that doesn't support namespace,
> and the reason why we used NULL and GLOBAL. Plus a comment on top of this
> file (near where we described local vs global) to clarify the status of
> this.
> 
> That said, if next week net-next will close, I think we can send a follow-up
> patch just for those comments, so:

Sounds good, I'll wait for further feedback before sending anything!

> 
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> 

> > +}
> > EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
> > 
> > -struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > -					 struct sockaddr_vm *dst)
> > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > +					     struct sockaddr_vm *dst,
> > +					     struct net *net,
> > +					     enum vsock_net_mode net_mode)
> > {
> > 	struct sock *sk;
> > 
> > 	spin_lock_bh(&vsock_table_lock);
> > -	sk = __vsock_find_connected_socket(src, dst);
> > +	sk = __vsock_find_connected_socket_net(src, dst, net, net_mode);
> > 	if (sk)
> > 		sock_hold(sk);
> > 
> > @@ -333,6 +407,14 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > 
> > 	return sk;
> > }
> > +EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
> > +
> > +struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > +					 struct sockaddr_vm *dst)
> > +{
> > +	return vsock_find_connected_socket_net(src, dst,
> > +					       NULL, VSOCK_NET_MODE_GLOBAL);
> > +}
> > EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>