drivers/nvme/target/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
From: Peijie Shao <shaopeijie@cestc.cn>
Replacing sock_create() with sock_create_kern()
changes the socket object's label to kernel_t,
thereby bypassing unnecessary SELinux permission
checks. It also helps to avoid copy and paste bugs.
Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
---
drivers/nvme/target/tcp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 4f9cac8a5abe..216afacc8179 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -2049,7 +2049,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
if (port->nport->inline_data_size < 0)
port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
- ret = sock_create(port->addr.ss_family, SOCK_STREAM,
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ port->addr.ss_family, SOCK_STREAM,
IPPROTO_TCP, &port->sock);
if (ret) {
pr_err("failed to create a socket\n");
--
2.43.0
On 3/23/25 20:17, shaopeijie@cestc.cn wrote: > From: Peijie Shao<shaopeijie@cestc.cn> > > Replacing sock_create() with sock_create_kern() > changes the socket object's label to kernel_t, > thereby bypassing unnecessary SELinux permission > checks. It also helps to avoid copy and paste bugs. > > Signed-off-by: Peijie Shao<shaopeijie@cestc.cn> can you please add the version history from next time when posting new version ? -ck
On Mon, 24 Mar 2025 11:17:08 +0800
shaopeijie@cestc.cn wrote:
> From: Peijie Shao <shaopeijie@cestc.cn>
>
> Replacing sock_create() with sock_create_kern()
> changes the socket object's label to kernel_t,
> thereby bypassing unnecessary SELinux permission
> checks. It also helps to avoid copy and paste bugs.
Does sock_create_kern() hold a reference on the namespace?
It hadn't used to and sock_create() will take one.
David
>
> Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
> ---
> drivers/nvme/target/tcp.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 4f9cac8a5abe..216afacc8179 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -2049,7 +2049,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
> if (port->nport->inline_data_size < 0)
> port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
>
> - ret = sock_create(port->addr.ss_family, SOCK_STREAM,
> + ret = sock_create_kern(current->nsproxy->net_ns,
> + port->addr.ss_family, SOCK_STREAM,
> IPPROTO_TCP, &port->sock);
> if (ret) {
> pr_err("failed to create a socket\n");
On 3/25/2025 5:04 AM, David Laight wrote:
> On Mon, 24 Mar 2025 11:17:08 +0800
> shaopeijie@cestc.cn wrote:
>
>> From: Peijie Shao <shaopeijie@cestc.cn>
>>
>> Replacing sock_create() with sock_create_kern()
>> changes the socket object's label to kernel_t,
>> thereby bypassing unnecessary SELinux permission
>> checks. It also helps to avoid copy and paste bugs.
>
> Does sock_create_kern() hold a reference on the namespace?
> It hadn't used to and sock_create() will take one.
>
> David
>
>>
>> Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
>> ---
>> drivers/nvme/target/tcp.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 4f9cac8a5abe..216afacc8179 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -2049,7 +2049,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
>> if (port->nport->inline_data_size < 0)
>> port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
>>
>> - ret = sock_create(port->addr.ss_family, SOCK_STREAM,
>> + ret = sock_create_kern(current->nsproxy->net_ns,
>> + port->addr.ss_family, SOCK_STREAM,
>> IPPROTO_TCP, &port->sock);
>> if (ret) {
>> pr_err("failed to create a socket\n");
>
>
>
Thanks for reviewing. Indeed, there is a netns UAF issue. The call path
sock_create() -> __sock_create() -> inet_create() -> sk_alloc() takes a
reference of the given netns, however sock_create_kern() doesn't perform
this step.
Sorroy the patch for the host side '[PATCH] nvme-tcp: fix selinux denied
when calling sock_sendmsg' has the same problem.
I will send a v2 patch to fix both host and target side soon, thanks.
From: Peijie Shao <shaopeijie@cestc.cn>
The patch is for nvme-tcp host side.
commit 1be52169c348
("nvme-tcp: fix selinux denied when calling sock_sendmsg")
uses sock_create_kern instead of sock_create to solve SELinux
problem, however sock_create_kern does not take a reference of
given netns, which results in a use-after-freewhen the
non-init_net netns is destroyed before sock_release.
For example: a container not share with host's network namespace
doing a 'nvme connect', and is stopped without 'nvme disconnect'.
The patch changes parameter current->nsproxy->net_ns to init_net,
makes the socket always belongs to the host. It also naturally
avoids changing sock's netns from previous creator's netns to
init_net when sock is re-created by nvme recovery path
(workqueue is in init_net namespace).
Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
---
Hi all,
This is the v1 patch. Before this version, I tried to
get_net(current->nsproxy->net_ns) in nvme_tcp_alloc_queue() to
fix the issue, but failed to find a suitable placeto do
put_net(). Because the socket is released by fput() internally.
I think code like below:
nvme_tcp_free_queue() {
fput()
put_net()
}
can not ensure the socket was released before put_net, since
someone is still holding the file.
So I prefer using the 'init_net' net namespace.
---
drivers/nvme/host/tcp.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 26c459f0198d..1b2d3d37656d 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1789,7 +1789,13 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->cmnd_capsule_len = sizeof(struct nvme_command) +
NVME_TCP_ADMIN_CCSZ;
- ret = sock_create_kern(current->nsproxy->net_ns,
+ /* sock_create_kern() does not take a reference to
+ * current->nsproxy->net_ns, use init_net instead.
+ * This also avoid changing sock's netns from previous
+ * creator's netns to init_net when sock is re-created
+ * by nvme recovery path.
+ */
+ ret = sock_create_kern(&init_net,
ctrl->addr.ss_family, SOCK_STREAM,
IPPROTO_TCP, &queue->sock);
if (ret) {
--
2.43.0
On Tue, Apr 01, 2025 at 02:19:34PM +0800, shaopeijie@cestc.cn wrote: > + /* sock_create_kern() does not take a reference to > + * current->nsproxy->net_ns, use init_net instead. > + * This also avoid changing sock's netns from previous > + * creator's netns to init_net when sock is re-created > + * by nvme recovery path. > + */ Kernel comment style is /* * .... */ > + ret = sock_create_kern(&init_net, > ctrl->addr.ss_family, SOCK_STREAM, > IPPROTO_TCP, &queue->sock); This can be realigned: ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM, IPPROTO_TCP, &queue->sock);
On Thu, Apr 03, 2025 at 06:30:01AM +0200, Christoph Hellwig wrote: > > + ret = sock_create_kern(&init_net, > > ctrl->addr.ss_family, SOCK_STREAM, > > IPPROTO_TCP, &queue->sock); > > This can be realigned: > > ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM, > IPPROTO_TCP, &queue->sock); Also did the original patch get merged already? If not please fold both into a single one.
On 4/3/2025 12:30 PM, Christoph Hellwig wrote: > On Thu, Apr 03, 2025 at 06:30:01AM +0200, Christoph Hellwig wrote: >>> + ret = sock_create_kern(&init_net, >>> ctrl->addr.ss_family, SOCK_STREAM, >>> IPPROTO_TCP, &queue->sock); >> >> This can be realigned: >> >> ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM, >> IPPROTO_TCP, &queue->sock); > > Also did the original patch get merged already? If not please fold > both into a single one. > > Thanks! Yes,the patch is merged into https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1be52169c3488ef98582ed553ab35cefa3978817 Style problems will be fixed in v2 version.
From: Peijie Shao <shaopeijie@cestc.cn>
The patch is for nvme-tcp host side.
commit 1be52169c348
("nvme-tcp: fix selinux denied when calling sock_sendmsg")
uses sock_create_kern instead of sock_create to solve SELinux
problem, however sock_create_kern does not take a reference of
the given netns, which results in a use-after-free when the
non-init_net netns is destroyed before sock_release.
For example: a container not share with host's network namespace
doing a 'nvme connect', and is stopped without 'nvme disconnect'.
The patch changes parameter current->nsproxy->net_ns to init_net,
makes the socket always belongs to the host. It also naturally
avoids changing sock's netns from previous creator's netns to
init_net when sock is re-created by nvme recovery path
(workqueue is in init_net namespace).
Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
---
Changes in v2:
1. Fix style problems reviewed by Christoph Hellwig, thanks!
2. Add 'nvme-tcp:' prefix for the patch.
Version v1:
Hi all,
This is the v1 patch. Before this version, I tried to
get_net(current->nsproxy->net_ns) in nvme_tcp_alloc_queue() to
fix the issue, but failed to find a suitable placeto do
put_net(). Because the socket is released by fput() internally.
I think code like below:
nvme_tcp_free_queue() {
fput()
put_net()
}
can not ensure the socket was released before put_net, since
someone is still holding the file.
So I would like to use the 'init_net' net namespace.
---
drivers/nvme/host/tcp.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 26c459f0198d..9b1d0ad18b77 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1789,8 +1789,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->cmnd_capsule_len = sizeof(struct nvme_command) +
NVME_TCP_ADMIN_CCSZ;
- ret = sock_create_kern(current->nsproxy->net_ns,
- ctrl->addr.ss_family, SOCK_STREAM,
+ /*
+ * sock_create_kern() does not take a reference to
+ * current->nsproxy->net_ns, use init_net instead.
+ * This also avoid changing sock's netns from previous
+ * creator's netns to init_net when sock is re-created
+ * by nvme recovery path.
+ */
+ ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM,
IPPROTO_TCP, &queue->sock);
if (ret) {
dev_err(nctrl->device,
--
2.43.0
I had another look at this patch, and it feels wrong to me. I don't think we are supposed to create sockets triggered by activity in a network namespace in the global namespace even if they are indirectly created through the nvme interface. But maybe I'm misunderstanding how network namespaces work, which is entirely possible. So to avoid the failure I'd be tempted to instead revert commit 1be52169c348 until the problem is fully sorted out.
From: Christoph Hellwig <hch@lst.de> Date: Mon, 7 Apr 2025 16:31:21 +0200 > I had another look at this patch, and it feels wrong to me. I don't > think we are supposed to create sockets triggered by activity in > a network namespace in the global namespace even if they are indirectly > created through the nvme interface. But maybe I'm misunderstanding > how network namespaces work, which is entirely possible. > > So to avoid the failure I'd be tempted to instead revert commit > 1be52169c348 until the problem is fully sorted out. The followup patch is wrong, and the correct fix is to take a reference to the netns by sk_net_refcnt_upgrade(). ---8<--- diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 26c459f0198d..72d260201d8c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1803,6 +1803,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid, ret = PTR_ERR(sock_file); goto err_destroy_mutex; } + + sk_net_refcnt_upgrade(queue->sock->sk); nvme_tcp_reclassify_socket(queue->sock); /* Single syn retry */ ---8<---
On Mon, Apr 07, 2025 at 10:18:18AM -0700, Kuniyuki Iwashima wrote: > The followup patch is wrong, and the correct fix is to take a reference > to the netns by sk_net_refcnt_upgrade(). Can you send a formal patch for this?
From: Christoph Hellwig <hch@lst.de> Date: Tue, 8 Apr 2025 07:04:08 +0200 > On Mon, Apr 07, 2025 at 10:18:18AM -0700, Kuniyuki Iwashima wrote: > > The followup patch is wrong, and the correct fix is to take a reference > > to the netns by sk_net_refcnt_upgrade(). > > Can you send a formal patch for this? For sure. Which branch/tag should be based on, for-next or nvme-6.15 ? http://git.infradead.org/nvme.git
On Mon, Apr 07, 2025 at 10:55:27PM -0700, Kuniyuki Iwashima wrote: > Which branch/tag should be based on, for-next or nvme-6.15 ? > http://git.infradead.org/nvme.git nvme-6.15 is the canonical tree, but for bug fixes I'm fine with almost anything :)
From: Christoph Hellwig <hch@lst.de> Date: Tue, 8 Apr 2025 07:58:30 +0200 > On Mon, Apr 07, 2025 at 10:55:27PM -0700, Kuniyuki Iwashima wrote: > > Which branch/tag should be based on, for-next or nvme-6.15 ? > > http://git.infradead.org/nvme.git > > nvme-6.15 is the canonical tree, but for bug fixes I'm fine with > almost anything :) Thanks, will post a patch tomorrow :)
I'll do another minor fixup for the comment formatting, but otherwise
this looks good. I'll queue it up.
On Thu, Apr 03, 2025 at 10:47:48PM +0800, shaopeijie@cestc.cn wrote:
> From: Peijie Shao <shaopeijie@cestc.cn>
>
> The patch is for nvme-tcp host side.
>
> commit 1be52169c348
> ("nvme-tcp: fix selinux denied when calling sock_sendmsg")
> uses sock_create_kern instead of sock_create to solve SELinux
> problem, however sock_create_kern does not take a reference of
> the given netns, which results in a use-after-free when the
> non-init_net netns is destroyed before sock_release.
>
> For example: a container not share with host's network namespace
> doing a 'nvme connect', and is stopped without 'nvme disconnect'.
>
> The patch changes parameter current->nsproxy->net_ns to init_net,
> makes the socket always belongs to the host. It also naturally
> avoids changing sock's netns from previous creator's netns to
> init_net when sock is re-created by nvme recovery path
> (workqueue is in init_net namespace).
>
> Signed-off-by: Peijie Shao <shaopeijie@cestc.cn>
> ---
>
> Changes in v2:
> 1. Fix style problems reviewed by Christoph Hellwig, thanks!
> 2. Add 'nvme-tcp:' prefix for the patch.
>
> Version v1:
> Hi all,
> This is the v1 patch. Before this version, I tried to
> get_net(current->nsproxy->net_ns) in nvme_tcp_alloc_queue() to
> fix the issue, but failed to find a suitable placeto do
> put_net(). Because the socket is released by fput() internally.
> I think code like below:
> nvme_tcp_free_queue() {
> fput()
> put_net()
> }
> can not ensure the socket was released before put_net, since
> someone is still holding the file.
>
> So I would like to use the 'init_net' net namespace.
>
> ---
> drivers/nvme/host/tcp.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 26c459f0198d..9b1d0ad18b77 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1789,8 +1789,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
> queue->cmnd_capsule_len = sizeof(struct nvme_command) +
> NVME_TCP_ADMIN_CCSZ;
>
> - ret = sock_create_kern(current->nsproxy->net_ns,
> - ctrl->addr.ss_family, SOCK_STREAM,
> + /*
> + * sock_create_kern() does not take a reference to
> + * current->nsproxy->net_ns, use init_net instead.
> + * This also avoid changing sock's netns from previous
> + * creator's netns to init_net when sock is re-created
> + * by nvme recovery path.
> + */
> + ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM,
> IPPROTO_TCP, &queue->sock);
> if (ret) {
> dev_err(nctrl->device,
> --
> 2.43.0
>
>
---end quoted text---
© 2016 - 2025 Red Hat, Inc.