[PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones

alistair23@gmail.com posted 1 patch 2 months ago
drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones
Posted by alistair23@gmail.com 2 months ago
From: Alistair Francis <alistair.francis@wdc.com>

Previously after the host sends a REPLACETLSPSK we freed the TLS keys as
part of calling nvmet_auth_sq_free() on success. A recent change ensured
we don't free the keys, allowing REPLACETLSPSK to work.

But that fix results in a kernel memory leak when running

```
nvme_trtype=loop ./check nvme/041 nvme/042 nvme/043 nvme/044 nvme/045 nvme/051 nvme/052
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
```

We can't free the keys on a successful DHCHAP operation, otherwise the
next REPLACETLSPSK will fail, so instead let's free them before we
replace them as part of nvmet_auth_challenge().

This ensures that REPLACETLSPSK works, while also avoiding any memory
leaks.

Fixes: 2e6eb6b277f59 ("nvmet-tcp: Don't free SQ on authentication success")
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
index b9ab80c7a6941..58185184478a4 100644
--- a/drivers/nvme/target/fabrics-cmd-auth.c
+++ b/drivers/nvme/target/fabrics-cmd-auth.c
@@ -412,6 +412,13 @@ static int nvmet_auth_challenge(struct nvmet_req *req, void *d, int al)
 	int hash_len = nvme_auth_hmac_hash_len(ctrl->shash_id);
 	int data_size = sizeof(*d) + hash_len;
 
+	/*
+	 * If replacing the keys then we have previous successful keys
+	 * that might be leaked, so we need to free them here.
+	 */
+	if (req->sq->dhchap_c1)
+		nvmet_auth_sq_free(req->sq);
+
 	if (ctrl->dh_tfm)
 		data_size += ctrl->dh_keysize;
 	if (al < data_size) {
-- 
2.53.0
Re: [PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones
Posted by Hannes Reinecke 2 months ago
On 4/16/26 01:02, alistair23@gmail.com wrote:
> From: Alistair Francis <alistair.francis@wdc.com>
> 
> Previously after the host sends a REPLACETLSPSK we freed the TLS keys as
> part of calling nvmet_auth_sq_free() on success. A recent change ensured
> we don't free the keys, allowing REPLACETLSPSK to work.
> 
> But that fix results in a kernel memory leak when running
> 
> ```
> nvme_trtype=loop ./check nvme/041 nvme/042 nvme/043 nvme/044 nvme/045 nvme/051 nvme/052
> echo scan > /sys/kernel/debug/kmemleak
> cat /sys/kernel/debug/kmemleak
> ```
> 
> We can't free the keys on a successful DHCHAP operation, otherwise the
> next REPLACETLSPSK will fail, so instead let's free them before we
> replace them as part of nvmet_auth_challenge().
> 
> This ensures that REPLACETLSPSK works, while also avoiding any memory
> leaks.
> 
> Fixes: 2e6eb6b277f59 ("nvmet-tcp: Don't free SQ on authentication success")
> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
> ---
>   drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
> index b9ab80c7a6941..58185184478a4 100644
> --- a/drivers/nvme/target/fabrics-cmd-auth.c
> +++ b/drivers/nvme/target/fabrics-cmd-auth.c
> @@ -412,6 +412,13 @@ static int nvmet_auth_challenge(struct nvmet_req *req, void *d, int al)
>   	int hash_len = nvme_auth_hmac_hash_len(ctrl->shash_id);
>   	int data_size = sizeof(*d) + hash_len;
>   
> +	/*
> +	 * If replacing the keys then we have previous successful keys
> +	 * that might be leaked, so we need to free them here.
> +	 */
> +	if (req->sq->dhchap_c1)
> +		nvmet_auth_sq_free(req->sq);
> +
>   	if (ctrl->dh_tfm)
>   		data_size += ctrl->dh_keysize;
>   	if (al < data_size) {
I am not sure.
The authentication variables should be freed as soon as the 
authentication completes; the session key is ephemeral and
should not be stored longer than necessary and will _never_
be used again once authentication completes.
The TLS key, OTOH, is used throughout the session and needs
to be present while the session is active
As such, both sets have vastly different lifetimes, and
I would argue that this

void nvmet_auth_sq_free(struct nvmet_sq *sq)
{
	cancel_delayed_work(&sq->auth_expired_work);
#ifdef CONFIG_NVME_TARGET_TCP_TLS
	sq->tls_key = NULL;
#endif
	kfree(sq->dhchap_c1);
	sq->dhchap_c1 = NULL;

is actually wrong as we should not modify 'tls_key' here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Re: [PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones
Posted by Chris Leech 2 months ago
On Thu, Apr 16, 2026 at 08:16:14AM +0200, Hannes Reinecke wrote:
> On 4/16/26 01:02, alistair23@gmail.com wrote:
> > From: Alistair Francis <alistair.francis@wdc.com>
> > 
> > Previously after the host sends a REPLACETLSPSK we freed the TLS keys as
> > part of calling nvmet_auth_sq_free() on success. A recent change ensured
> > we don't free the keys, allowing REPLACETLSPSK to work.
> > 
> > But that fix results in a kernel memory leak when running
> > 
> > ```
> > nvme_trtype=loop ./check nvme/041 nvme/042 nvme/043 nvme/044 nvme/045 nvme/051 nvme/052
> > echo scan > /sys/kernel/debug/kmemleak
> > cat /sys/kernel/debug/kmemleak
> > ```
> > 
> > We can't free the keys on a successful DHCHAP operation, otherwise the
> > next REPLACETLSPSK will fail, so instead let's free them before we
> > replace them as part of nvmet_auth_challenge().
> > 
> > This ensures that REPLACETLSPSK works, while also avoiding any memory
> > leaks.
> > 
> > Fixes: 2e6eb6b277f59 ("nvmet-tcp: Don't free SQ on authentication success")
> > Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
> > ---
> >   drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
> > index b9ab80c7a6941..58185184478a4 100644
> > --- a/drivers/nvme/target/fabrics-cmd-auth.c
> > +++ b/drivers/nvme/target/fabrics-cmd-auth.c
> > @@ -412,6 +412,13 @@ static int nvmet_auth_challenge(struct nvmet_req *req, void *d, int al)
> >   	int hash_len = nvme_auth_hmac_hash_len(ctrl->shash_id);
> >   	int data_size = sizeof(*d) + hash_len;
> > +	/*
> > +	 * If replacing the keys then we have previous successful keys
> > +	 * that might be leaked, so we need to free them here.
> > +	 */
> > +	if (req->sq->dhchap_c1)
> > +		nvmet_auth_sq_free(req->sq);
> > +
> >   	if (ctrl->dh_tfm)
> >   		data_size += ctrl->dh_keysize;
> >   	if (al < data_size) {
> I am not sure.
> The authentication variables should be freed as soon as the authentication
> completes; the session key is ephemeral and
> should not be stored longer than necessary and will _never_
> be used again once authentication completes.
> The TLS key, OTOH, is used throughout the session and needs
> to be present while the session is active
> As such, both sets have vastly different lifetimes, and
> I would argue that this
> 
> void nvmet_auth_sq_free(struct nvmet_sq *sq)
> {
> 	cancel_delayed_work(&sq->auth_expired_work);
> #ifdef CONFIG_NVME_TARGET_TCP_TLS
> 	sq->tls_key = NULL;
> #endif
> 	kfree(sq->dhchap_c1);
> 	sq->dhchap_c1 = NULL;
> 
> is actually wrong as we should not modify 'tls_key' here.

I agree with Hannes, and was just about to respond with the same
feedback.  I think the freeing of the auth temporaries needs to be
returned to fix the memleak, and the real problem is the setting of
tls_key to NULL.  That doesn't seem like the right lifetime for tls_key,
and it looks to be a reference count leak as well.

Is the presence of sq->tls_key the best check to see if the socket is
currently in a kTLS mode?  (it might be, I'm not as up on the target
code)

- Chris
Re: [PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones
Posted by Alistair Francis 2 months ago
On Fri, Apr 17, 2026 at 1:27 AM Chris Leech <cleech@redhat.com> wrote:
>
> On Thu, Apr 16, 2026 at 08:16:14AM +0200, Hannes Reinecke wrote:
> > On 4/16/26 01:02, alistair23@gmail.com wrote:
> > > From: Alistair Francis <alistair.francis@wdc.com>
> > >
> > > Previously after the host sends a REPLACETLSPSK we freed the TLS keys as
> > > part of calling nvmet_auth_sq_free() on success. A recent change ensured
> > > we don't free the keys, allowing REPLACETLSPSK to work.
> > >
> > > But that fix results in a kernel memory leak when running
> > >
> > > ```
> > > nvme_trtype=loop ./check nvme/041 nvme/042 nvme/043 nvme/044 nvme/045 nvme/051 nvme/052
> > > echo scan > /sys/kernel/debug/kmemleak
> > > cat /sys/kernel/debug/kmemleak
> > > ```
> > >
> > > We can't free the keys on a successful DHCHAP operation, otherwise the
> > > next REPLACETLSPSK will fail, so instead let's free them before we
> > > replace them as part of nvmet_auth_challenge().
> > >
> > > This ensures that REPLACETLSPSK works, while also avoiding any memory
> > > leaks.
> > >
> > > Fixes: 2e6eb6b277f59 ("nvmet-tcp: Don't free SQ on authentication success")
> > > Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
> > > ---
> > >   drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
> > >   1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
> > > index b9ab80c7a6941..58185184478a4 100644
> > > --- a/drivers/nvme/target/fabrics-cmd-auth.c
> > > +++ b/drivers/nvme/target/fabrics-cmd-auth.c
> > > @@ -412,6 +412,13 @@ static int nvmet_auth_challenge(struct nvmet_req *req, void *d, int al)
> > >     int hash_len = nvme_auth_hmac_hash_len(ctrl->shash_id);
> > >     int data_size = sizeof(*d) + hash_len;
> > > +   /*
> > > +    * If replacing the keys then we have previous successful keys
> > > +    * that might be leaked, so we need to free them here.
> > > +    */
> > > +   if (req->sq->dhchap_c1)
> > > +           nvmet_auth_sq_free(req->sq);
> > > +
> > >     if (ctrl->dh_tfm)
> > >             data_size += ctrl->dh_keysize;
> > >     if (al < data_size) {
> > I am not sure.
> > The authentication variables should be freed as soon as the authentication
> > completes; the session key is ephemeral and
> > should not be stored longer than necessary and will _never_
> > be used again once authentication completes.
> > The TLS key, OTOH, is used throughout the session and needs
> > to be present while the session is active
> > As such, both sets have vastly different lifetimes, and
> > I would argue that this
> >
> > void nvmet_auth_sq_free(struct nvmet_sq *sq)
> > {
> >       cancel_delayed_work(&sq->auth_expired_work);
> > #ifdef CONFIG_NVME_TARGET_TCP_TLS
> >       sq->tls_key = NULL;
> > #endif
> >       kfree(sq->dhchap_c1);
> >       sq->dhchap_c1 = NULL;
> >
> > is actually wrong as we should not modify 'tls_key' here.
>
> I agree with Hannes, and was just about to respond with the same
> feedback.  I think the freeing of the auth temporaries needs to be
> returned to fix the memleak, and the real problem is the setting of
> tls_key to NULL.  That doesn't seem like the right lifetime for tls_key,
> and it looks to be a reference count leak as well.

Yep, agreed. Patches sent to revert the free and remove the
`sq->tls_key = NULL;`

>
> Is the presence of sq->tls_key the best check to see if the socket is
> currently in a kTLS mode?  (it might be, I'm not as up on the target
> code)

I think it is correct, the issue is just that we are setting sq->tls_key to NULL

Alistair
Re: [PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones
Posted by Christoph Hellwig 2 months ago
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>