[PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348

shaopeijie@cestc.cn posted 1 patch 10 months, 1 week ago
drivers/nvme/host/tcp.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
[PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by shaopeijie@cestc.cn 10 months, 1 week ago
From: Peijie Shao <shaopeijie@cestc.cn>

The patch is for nvme-tcp host side.

commit 1be52169c348
("nvme-tcp: fix selinux denied when calling sock_sendmsg")
uses sock_create_kern instead of sock_create to solve SELinux
problem, however sock_create_kern does not take a reference of
the given netns, which results in a use-after-free when the
non-init_net netns is destroyed before sock_release.

For example: a container not share with host's network namespace
doing a 'nvme connect', and is stopped without 'nvme disconnect'.

The patch changes parameter current->nsproxy->net_ns to init_net,
makes the socket always belongs to the host. It also naturally
avoids changing sock's netns from previous creator's netns to
init_net when sock is re-created by nvme recovery path
(workqueue is in init_net namespace).

Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
---

Changes in v2:
    1. Fix style problems reviewed by Christoph Hellwig, thanks!
    2. Add 'nvme-tcp:' prefix for the patch.

Version v1:
Hi all,
This is the v1 patch. Before this version, I tried to
get_net(current->nsproxy->net_ns) in nvme_tcp_alloc_queue() to
fix the issue, but failed to find a suitable placeto do
put_net(). Because the socket is released by fput() internally.
I think code like below:
    nvme_tcp_free_queue() {
        fput()
        put_net()
    }
can not ensure the socket was released before put_net, since
someone is still holding the file.

So I would like to use the 'init_net' net namespace.

---
 drivers/nvme/host/tcp.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 26c459f0198d..9b1d0ad18b77 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1789,8 +1789,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 		queue->cmnd_capsule_len = sizeof(struct nvme_command) +
 						NVME_TCP_ADMIN_CCSZ;
 
-	ret = sock_create_kern(current->nsproxy->net_ns,
-			ctrl->addr.ss_family, SOCK_STREAM,
+	/*
+	 * sock_create_kern() does not take a reference to
+	 * current->nsproxy->net_ns, use init_net instead.
+	 * This also avoid changing sock's netns from previous
+	 * creator's netns to init_net when sock is re-created
+	 * by nvme recovery path.
+	 */
+	ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM,
 			IPPROTO_TCP, &queue->sock);
 	if (ret) {
 		dev_err(nctrl->device,
-- 
2.43.0
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Christoph Hellwig 10 months ago
I had another look at this patch, and it feels wrong to me.  I don't
think we are supposed to create sockets triggered by activity in
a network namespace in the global namespace even if they are indirectly
created through the nvme interface.  But maybe I'm misunderstanding
how network namespaces work, which is entirely possible.

So to avoid the failure I'd be tempted to instead revert commit
1be52169c348 until the problem is fully sorted out.
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Kuniyuki Iwashima 10 months ago
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 7 Apr 2025 16:31:21 +0200
> I had another look at this patch, and it feels wrong to me.  I don't
> think we are supposed to create sockets triggered by activity in
> a network namespace in the global namespace even if they are indirectly
> created through the nvme interface.  But maybe I'm misunderstanding
> how network namespaces work, which is entirely possible.
> 
> So to avoid the failure I'd be tempted to instead revert commit
> 1be52169c348 until the problem is fully sorted out.

The followup patch is wrong, and the correct fix is to take a reference
to the netns by sk_net_refcnt_upgrade().

---8<---
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 26c459f0198d..72d260201d8c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1803,6 +1803,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 		ret = PTR_ERR(sock_file);
 		goto err_destroy_mutex;
 	}
+
+	sk_net_refcnt_upgrade(queue->sock->sk);
 	nvme_tcp_reclassify_socket(queue->sock);
 
 	/* Single syn retry */
---8<---
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Christoph Hellwig 10 months ago
On Mon, Apr 07, 2025 at 10:18:18AM -0700, Kuniyuki Iwashima wrote:
> The followup patch is wrong, and the correct fix is to take a reference
> to the netns by sk_net_refcnt_upgrade().

Can you send a formal patch for this?
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Kuniyuki Iwashima 10 months ago
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 8 Apr 2025 07:04:08 +0200
> On Mon, Apr 07, 2025 at 10:18:18AM -0700, Kuniyuki Iwashima wrote:
> > The followup patch is wrong, and the correct fix is to take a reference
> > to the netns by sk_net_refcnt_upgrade().
> 
> Can you send a formal patch for this?

For sure.

Which branch/tag should be based on, for-next or nvme-6.15 ?
http://git.infradead.org/nvme.git
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Christoph Hellwig 10 months ago
On Mon, Apr 07, 2025 at 10:55:27PM -0700, Kuniyuki Iwashima wrote:
> Which branch/tag should be based on, for-next or nvme-6.15 ?
> http://git.infradead.org/nvme.git

nvme-6.15 is the canonical tree, but for bug fixes I'm fine with
almost anything :)
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Kuniyuki Iwashima 10 months ago
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 8 Apr 2025 07:58:30 +0200
> On Mon, Apr 07, 2025 at 10:55:27PM -0700, Kuniyuki Iwashima wrote:
> > Which branch/tag should be based on, for-next or nvme-6.15 ?
> > http://git.infradead.org/nvme.git
> 
> nvme-6.15 is the canonical tree, but for bug fixes I'm fine with
> almost anything :)

Thanks, will post a patch tomorrow :)
Re: [PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348
Posted by Christoph Hellwig 10 months, 1 week ago
I'll do another minor fixup for the comment formatting, but otherwise
this looks good.  I'll queue it up.

On Thu, Apr 03, 2025 at 10:47:48PM +0800, shaopeijie@cestc.cn wrote:
> From: Peijie Shao <shaopeijie@cestc.cn>
> 
> The patch is for nvme-tcp host side.
> 
> commit 1be52169c348
> ("nvme-tcp: fix selinux denied when calling sock_sendmsg")
> uses sock_create_kern instead of sock_create to solve SELinux
> problem, however sock_create_kern does not take a reference of
> the given netns, which results in a use-after-free when the
> non-init_net netns is destroyed before sock_release.
> 
> For example: a container not share with host's network namespace
> doing a 'nvme connect', and is stopped without 'nvme disconnect'.
> 
> The patch changes parameter current->nsproxy->net_ns to init_net,
> makes the socket always belongs to the host. It also naturally
> avoids changing sock's netns from previous creator's netns to
> init_net when sock is re-created by nvme recovery path
> (workqueue is in init_net namespace).
> 
> Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>

> ---
> 
> Changes in v2:
>     1. Fix style problems reviewed by Christoph Hellwig, thanks!
>     2. Add 'nvme-tcp:' prefix for the patch.
> 
> Version v1:
> Hi all,
> This is the v1 patch. Before this version, I tried to
> get_net(current->nsproxy->net_ns) in nvme_tcp_alloc_queue() to
> fix the issue, but failed to find a suitable placeto do
> put_net(). Because the socket is released by fput() internally.
> I think code like below:
>     nvme_tcp_free_queue() {
>         fput()
>         put_net()
>     }
> can not ensure the socket was released before put_net, since
> someone is still holding the file.
> 
> So I would like to use the 'init_net' net namespace.
> 
> ---
>  drivers/nvme/host/tcp.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 26c459f0198d..9b1d0ad18b77 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1789,8 +1789,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
>  		queue->cmnd_capsule_len = sizeof(struct nvme_command) +
>  						NVME_TCP_ADMIN_CCSZ;
>  
> -	ret = sock_create_kern(current->nsproxy->net_ns,
> -			ctrl->addr.ss_family, SOCK_STREAM,
> +	/*
> +	 * sock_create_kern() does not take a reference to
> +	 * current->nsproxy->net_ns, use init_net instead.
> +	 * This also avoid changing sock's netns from previous
> +	 * creator's netns to init_net when sock is re-created
> +	 * by nvme recovery path.
> +	 */
> +	ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM,
>  			IPPROTO_TCP, &queue->sock);
>  	if (ret) {
>  		dev_err(nctrl->device,
> -- 
> 2.43.0
> 
> 
---end quoted text---