mm/zswap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
This is a hotfix for a potential zpool memory leak that could result in
the existing zswap_decompress():
mutex_unlock(&acomp_ctx->mutex);
if (src != acomp_ctx->buffer)
zpool_unmap_handle(zpool, entry->handle);
Releasing the lock before the conditional does not protect the integrity of
"src", which is set earlier under the acomp_ctx mutex lock. This poses a
risk for the conditional behaving as intended, and consequently not
unmapping the zpool handle, which could cause a zswap zpool memory leak.
This patch moves the mutex_unlock() to occur after the conditional and
subsequent zpool_unmap_handle(). This ensures that the value of "src"
obtained earlier, with the mutex locked, does not change.
Even though an actual memory leak was not observed, this fix seems like a
cleaner implementation.
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
---
mm/zswap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb23..58810fa8ff23 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -986,10 +986,11 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
- mutex_unlock(&acomp_ctx->mutex);
if (src != acomp_ctx->buffer)
zpool_unmap_handle(zpool, entry->handle);
+
+ mutex_unlock(&acomp_ctx->mutex);
}
/*********************************
base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c
--
2.27.0
On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar <kanchana.p.sridhar@intel.com> wrote: > > This is a hotfix for a potential zpool memory leak that could result in > the existing zswap_decompress(): > > mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > > Releasing the lock before the conditional does not protect the integrity of > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > risk for the conditional behaving as intended, and consequently not > unmapping the zpool handle, which could cause a zswap zpool memory leak. > > This patch moves the mutex_unlock() to occur after the conditional and > subsequent zpool_unmap_handle(). This ensures that the value of "src" > obtained earlier, with the mutex locked, does not change. The commit log is too complicated and incorrect. It is talking about the stability of 'src', but that's a local variable on the stack anyway. It doesn't need protection. The problem is 'acomp_ctx->buffer' being reused and changed after the mutex is released. Leading to the check not being reliable. Please simplify this. > > Even though an actual memory leak was not observed, this fix seems like a > cleaner implementation. > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > --- > mm/zswap.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index f6316b66fb23..58810fa8ff23 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -986,10 +986,11 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) > acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > - mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); Actually now that I think more about it, I think this check isn't entirely safe, even under the lock. Is it possible that 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous decompression at the same handle? In this case, we will also mistakenly skip the unmap. It would be more reliable to set a boolean variable if we copy to acomp_ctx->buffer and do the unmap, and check that flag here to check if the unmap was done or not. > + > + mutex_unlock(&acomp_ctx->mutex); > } > > /********************************* > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > -- > 2.27.0 >
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 12, 2024 9:35 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > <kanchana.p.sridhar@intel.com> wrote: > > > > This is a hotfix for a potential zpool memory leak that could result in > > the existing zswap_decompress(): > > > > mutex_unlock(&acomp_ctx->mutex); > > > > if (src != acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > > > Releasing the lock before the conditional does not protect the integrity of > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > risk for the conditional behaving as intended, and consequently not > > unmapping the zpool handle, which could cause a zswap zpool memory > leak. > > > > This patch moves the mutex_unlock() to occur after the conditional and > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > obtained earlier, with the mutex locked, does not change. > > The commit log is too complicated and incorrect. It is talking about > the stability of 'src', but that's a local variable on the stack > anyway. It doesn't need protection. > > The problem is 'acomp_ctx->buffer' being reused and changed after the > mutex is released. Leading to the check not being reliable. Please > simplify this. Thanks Yosry. That's exactly what I meant, but I think the wording got confusing. The problem I was trying to fix is the acomp_ctx->buffer value changing after the lock is released. This could happen as a result of any other compress or decompress that acquires the lock. I will simplify and clarify accordingly. > > > > > Even though an actual memory leak was not observed, this fix seems like a > > cleaner implementation. > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > --- > > mm/zswap.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb23..58810fa8ff23 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > zswap_entry *entry, struct folio *folio) > > acomp_request_set_params(acomp_ctx->req, &input, &output, entry- > >length, PAGE_SIZE); > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > >req), &acomp_ctx->wait)); > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > - mutex_unlock(&acomp_ctx->mutex); > > > > if (src != acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > Actually now that I think more about it, I think this check isn't > entirely safe, even under the lock. Is it possible that > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > decompression at the same handle? In this case, we will also > mistakenly skip the unmap. If we move the mutex_unlock to happen after the conditional and unmap, shouldn't that be sufficient under all conditions? With the fix, "src" can take on only 2 values in this procedure: the mapped handle or acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case if the mapped zpool handle happens to be equal to acomp_ctx->buffer. Please let me know if I am missing anything. > > It would be more reliable to set a boolean variable if we copy to > acomp_ctx->buffer and do the unmap, and check that flag here to check > if the unmap was done or not. Sure, this could be done, but not sure if it is required. Please let me know if we still need the boolean variable in addition to moving the mutex_unlock(). Thanks, Kanchana > > > + > > + mutex_unlock(&acomp_ctx->mutex); > > } > > > > /********************************* > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > -- > > 2.27.0 > >
On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Tuesday, November 12, 2024 9:35 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > the existing zswap_decompress(): > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > if (src != acomp_ctx->buffer) > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > Releasing the lock before the conditional does not protect the integrity of > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > > risk for the conditional behaving as intended, and consequently not > > > unmapping the zpool handle, which could cause a zswap zpool memory > > leak. > > > > > > This patch moves the mutex_unlock() to occur after the conditional and > > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > > obtained earlier, with the mutex locked, does not change. > > > > The commit log is too complicated and incorrect. It is talking about > > the stability of 'src', but that's a local variable on the stack > > anyway. It doesn't need protection. > > > > The problem is 'acomp_ctx->buffer' being reused and changed after the > > mutex is released. Leading to the check not being reliable. Please > > simplify this. > > Thanks Yosry. That's exactly what I meant, but I think the wording got > confusing. The problem I was trying to fix is the acomp_ctx->buffer > value changing after the lock is released. This could happen as a result of any > other compress or decompress that acquires the lock. I will simplify and > clarify accordingly. > > > > > > > > > Even though an actual memory leak was not observed, this fix seems like a > > > cleaner implementation. > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > --- > > > mm/zswap.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index f6316b66fb23..58810fa8ff23 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > zswap_entry *entry, struct folio *folio) > > > acomp_request_set_params(acomp_ctx->req, &input, &output, entry- > > >length, PAGE_SIZE); > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > >req), &acomp_ctx->wait)); > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > if (src != acomp_ctx->buffer) > > > zpool_unmap_handle(zpool, entry->handle); > > > > Actually now that I think more about it, I think this check isn't > > entirely safe, even under the lock. Is it possible that > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > decompression at the same handle? In this case, we will also > > mistakenly skip the unmap. > > If we move the mutex_unlock to happen after the conditional and unmap, > shouldn't that be sufficient under all conditions? With the fix, "src" can > take on only 2 values in this procedure: the mapped handle or > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. Yes, that's the scenario I mean. Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' happens to be equal to the same handle from a previous operation as we don't clear it. In this case, the 'src != acomp_ctx->buffer' check will be false, even though it should be true. This will result in an extra zpool_unmap_handle() call. I didn't look closely, but this seems like it can have a worse effect than leaking memory (e.g. there will be an extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting a random handle). > > Please let me know if I am missing anything. > > > > > It would be more reliable to set a boolean variable if we copy to > > acomp_ctx->buffer and do the unmap, and check that flag here to check > > if the unmap was done or not. > > Sure, this could be done, but not sure if it is required. Please let me know > if we still need the boolean variable in addition to moving the mutex_unlock(). If we use a boolean, there is no need to move mutex_unlock(). The boolean will be a local variable on the stack that doesn't need protection. > > Thanks, > Kanchana > > > > > > + > > > + mutex_unlock(&acomp_ctx->mutex); > > > } > > > > > > /********************************* > > > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > > -- > > > 2.27.0 > > >
Hi Yosry, > -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 12, 2024 10:22 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > hannes@cmpxchg.org; nphamcs@gmail.com; > chengming.zhou@linux.dev; > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > the existing zswap_decompress(): > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > if (src != acomp_ctx->buffer) > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > Releasing the lock before the conditional does not protect the integrity > of > > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > > > risk for the conditional behaving as intended, and consequently not > > > > unmapping the zpool handle, which could cause a zswap zpool memory > > > leak. > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional and > > > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > > > obtained earlier, with the mutex locked, does not change. > > > > > > The commit log is too complicated and incorrect. It is talking about > > > the stability of 'src', but that's a local variable on the stack > > > anyway. It doesn't need protection. > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after the > > > mutex is released. Leading to the check not being reliable. Please > > > simplify this. > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > value changing after the lock is released. This could happen as a result of > any > > other compress or decompress that acquires the lock. I will simplify and > > clarify accordingly. > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > like a > > > > cleaner implementation. > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > --- > > > > mm/zswap.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > zswap_entry *entry, struct folio *folio) > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > entry- > > > >length, PAGE_SIZE); > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > >req), &acomp_ctx->wait)); > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > if (src != acomp_ctx->buffer) > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > Actually now that I think more about it, I think this check isn't > > > entirely safe, even under the lock. Is it possible that > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > decompression at the same handle? In this case, we will also > > > mistakenly skip the unmap. > > > > If we move the mutex_unlock to happen after the conditional and unmap, > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > take on only 2 values in this procedure: the mapped handle or > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > Yes, that's the scenario I mean. > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > happens to be equal to the same handle from a previous operation as we > don't clear it. Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), we only copy to it. > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > though it should be true. This will result in an extra > zpool_unmap_handle() call. I didn't look closely, but this seems like > it can have a worse effect than leaking memory (e.g. there will be an > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > a random handle). > > > > > Please let me know if I am missing anything. > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > acomp_ctx->buffer and do the unmap, and check that flag here to check > > > if the unmap was done or not. > > > > Sure, this could be done, but not sure if it is required. Please let me know > > if we still need the boolean variable in addition to moving the > mutex_unlock(). > > If we use a boolean, there is no need to move mutex_unlock(). The > boolean will be a local variable on the stack that doesn't need > protection. I agree, using the boolean variable to do the unmap rather than the check for (src != acomp_ctx->buffer) is more fail-safe. I am still thinking moving the mutex_unlock() could help, or at least have no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock safeguards the interaction between the decompress operation, the sg_*() API calls inside zswap_decompress() and the shared zpool. If we release the per-cpu acomp_ctx's mutex lock before the zpool_unmap_handle(), is it possible that another cpu could acquire it's acomp_ctx's lock and map the same zpool handle (that the earlier cpu has yet to unmap or is concurrently unmapping) for a write? If this could happen, would it result in undefined state for both these zpool ops on different cpu's? Would keeping the per-cpu mutex locked through the zpool_unmap_handle() create a non-preemptible state that would avoid this? IOW, if the above scenario is possible, does the per-cpu acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow prevented by the implementation of the zpools? Just thought I would bring up these open questions. Please do share your thoughts and advise. Thanks, Kanchana > > > > > Thanks, > > Kanchana > > > > > > > > > + > > > > + mutex_unlock(&acomp_ctx->mutex); > > > > } > > > > > > > > /********************************* > > > > > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > > > -- > > > > 2.27.0 > > > >
On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > I am still thinking moving the mutex_unlock() could help, or at least have > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > safeguards the interaction between the decompress operation, the > sg_*() API calls inside zswap_decompress() and the shared zpool. > > If we release the per-cpu acomp_ctx's mutex lock before the > zpool_unmap_handle(), is it possible that another cpu could acquire > it's acomp_ctx's lock and map the same zpool handle (that the earlier > cpu has yet to unmap or is concurrently unmapping) for a write? > If this could happen, would it result in undefined state for both > these zpool ops on different cpu's? The code is fine as is. Like you said, acomp_ctx->buffer (the pointer) doesn't change. It points to whatever was kmalloced in zswap_cpu_comp_prepare(). The handle points to backend memory. Neither of those addresses can change under us. There is no confusing them, and they cannot coincide. The mutex guards the *memory* behind the buffer, so that we don't have multiple (de)compressors stepping on each others' toes. But it's fine to drop the mutex once we're done working with the memory. We don't need the mutex to check whether src holds the acomp buffer address. That being said, I do think there is a UAF bug in CPU hotplugging. There is an acomp_ctx for each cpu, but note that this is best effort parallelism, not a guarantee that we always have the context of the local CPU. Look closely: we pick the "local" CPU with preemption enabled, then contend for the mutex. This may well put us to sleep and get us migrated, so we could be using the context of a CPU we are no longer running on. This is fine because we hold the mutex - if that other CPU tries to use the acomp_ctx, it'll wait for us. However, if we get migrated and vacate the CPU whose context we have locked, the CPU might get offlined and zswap_cpu_comp_dead() can free the context underneath us. I think we need to refcount the acomp_ctx.
> -----Original Message----- > From: Johannes Weiner <hannes@cmpxchg.org> > Sent: Wednesday, November 13, 2024 1:30 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Yosry Ahmed <yosryahmed@google.com>; linux-kernel@vger.kernel.org; > linux-mm@kvack.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > The code is fine as is. > > Like you said, acomp_ctx->buffer (the pointer) doesn't change. It > points to whatever was kmalloced in zswap_cpu_comp_prepare(). The > handle points to backend memory. Neither of those addresses can change > under us. There is no confusing them, and they cannot coincide. > > The mutex guards the *memory* behind the buffer, so that we don't have > multiple (de)compressors stepping on each others' toes. But it's fine > to drop the mutex once we're done working with the memory. We don't > need the mutex to check whether src holds the acomp buffer address. Thanks Johannes, for these insights. I was thinking of the following in zswap_decompress() as creating a non-preemptible context because of the call to raw_cpu_ptr() at the start; with this context extending until the mutex_unlock(): acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); mutex_lock(&acomp_ctx->mutex); [...] mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); Based on this understanding, I was a bit worried about the "acomp_ctx->buffer" in the conditional that gates the zpool_unmap_handle() not being the same acomp_ctx as the one at the beginning. I may have been confusing myself, since the acomp_ctx is not re-evaluated before the conditional, just reused from the start. My apologies to you and Yosry! > > That being said, I do think there is a UAF bug in CPU hotplugging. > > There is an acomp_ctx for each cpu, but note that this is best effort > parallelism, not a guarantee that we always have the context of the > local CPU. Look closely: we pick the "local" CPU with preemption > enabled, then contend for the mutex. This may well put us to sleep and > get us migrated, so we could be using the context of a CPU we are no > longer running on. This is fine because we hold the mutex - if that > other CPU tries to use the acomp_ctx, it'll wait for us. > > However, if we get migrated and vacate the CPU whose context we have > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > the context underneath us. I think we need to refcount the acomp_ctx. I see. Wouldn't it then seem to make the code more fail-safe to not allow the migration to happen until after the check for (src != acomp_ctx->buffer), by moving the mutex_unlock() after this check? Or, use a boolean to determine if the unmap_handle needs to be done as Yosry suggested? Thanks, Kanchana
On Wed, Nov 13, 2024 at 2:13 PM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > Thanks Johannes, for these insights. I was thinking of the following > in zswap_decompress() as creating a non-preemptible context because > of the call to raw_cpu_ptr() at the start; with this context extending > until the mutex_unlock(): > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > mutex_lock(&acomp_ctx->mutex); > > [...] > > mutex_unlock(&acomp_ctx->mutex); > > if (src != acomp_ctx->buffer) > zpool_unmap_handle(zpool, entry->handle); > > Based on this understanding, I was a bit worried about the > "acomp_ctx->buffer" in the conditional that gates the > zpool_unmap_handle() not being the same acomp_ctx as the one > at the beginning. I may have been confusing myself, since the acomp_ctx > is not re-evaluated before the conditional, just reused from the > start. My apologies to you and Yosry! > > > > > That being said, I do think there is a UAF bug in CPU hotplugging. > > > > There is an acomp_ctx for each cpu, but note that this is best effort > > parallelism, not a guarantee that we always have the context of the > > local CPU. Look closely: we pick the "local" CPU with preemption > > enabled, then contend for the mutex. This may well put us to sleep and > > get us migrated, so we could be using the context of a CPU we are no > > longer running on. This is fine because we hold the mutex - if that > > other CPU tries to use the acomp_ctx, it'll wait for us. > > > > However, if we get migrated and vacate the CPU whose context we have > > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > > the context underneath us. I think we need to refcount the acomp_ctx. > > I see. Wouldn't it then seem to make the code more fail-safe to not allow > the migration to happen until after the check for (src != acomp_ctx->buffer), by > moving the mutex_unlock() after this check? Or, use a boolean to determine > if the unmap_handle needs to be done as Yosry suggested? Hmm does it make it safe? It is mutex_lock() that puts the task to sleep, after which it can get migrated to a different CPU. Moving mutex_unlock() to below or not doesn't really matter, no? Or am I missing something here... I think Johannes' proposal is the safest :)
> -----Original Message----- > From: Nhat Pham <nphamcs@gmail.com> > Sent: Wednesday, November 13, 2024 4:29 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org>; Yosry Ahmed > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 2:13 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > Thanks Johannes, for these insights. I was thinking of the following > > in zswap_decompress() as creating a non-preemptible context because > > of the call to raw_cpu_ptr() at the start; with this context extending > > until the mutex_unlock(): > > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > mutex_lock(&acomp_ctx->mutex); > > > > [...] > > > > mutex_unlock(&acomp_ctx->mutex); > > > > if (src != acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > > > Based on this understanding, I was a bit worried about the > > "acomp_ctx->buffer" in the conditional that gates the > > zpool_unmap_handle() not being the same acomp_ctx as the one > > at the beginning. I may have been confusing myself, since the acomp_ctx > > is not re-evaluated before the conditional, just reused from the > > start. My apologies to you and Yosry! > > > > > > > > That being said, I do think there is a UAF bug in CPU hotplugging. > > > > > > There is an acomp_ctx for each cpu, but note that this is best effort > > > parallelism, not a guarantee that we always have the context of the > > > local CPU. Look closely: we pick the "local" CPU with preemption > > > enabled, then contend for the mutex. This may well put us to sleep and > > > get us migrated, so we could be using the context of a CPU we are no > > > longer running on. This is fine because we hold the mutex - if that > > > other CPU tries to use the acomp_ctx, it'll wait for us. > > > > > > However, if we get migrated and vacate the CPU whose context we have > > > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > > > the context underneath us. I think we need to refcount the acomp_ctx. > > > > I see. Wouldn't it then seem to make the code more fail-safe to not allow > > the migration to happen until after the check for (src != acomp_ctx->buffer), > by > > moving the mutex_unlock() after this check? Or, use a boolean to determine > > if the unmap_handle needs to be done as Yosry suggested? > > Hmm does it make it safe? It is mutex_lock() that puts the task to > sleep, after which it can get migrated to a different CPU. Moving > mutex_unlock() to below or not doesn't really matter, no? Or am I > missing something here... Thanks for the comments, Nhat. I guess my last comment in response to what Johannes mentioned about the UAF situation, is the one that's still an open from my perspective. If the process can get migrated after the mutex unlock, and if the acomp_ctx obtained earlier gets deleted (as Johannes was mentioning), then we could be accessing an invalid pointer in the "if (src != acomp_ctx->buffer)". So my question was, can we prevent the migration to a different cpu by relinquishing the mutex lock after this conditional; or, make the conditional be determined by a boolean that gets set earlier to decide whether or not to unmap the zpool handle (as against using the acomp_ctx to decide this). Thanks, Kanchana > > I think Johannes' proposal is the safest :)
On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: > So my question was, can we prevent the migration to a different cpu > by relinquishing the mutex lock after this conditional Holding the mutex doesn't prevent preemption/migration.
> -----Original Message----- > From: Johannes Weiner <hannes@cmpxchg.org> > Sent: Wednesday, November 13, 2024 9:12 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: > > So my question was, can we prevent the migration to a different cpu > > by relinquishing the mutex lock after this conditional > > Holding the mutex doesn't prevent preemption/migration. Sure, however, is this also applicable to holding the mutex of a per-cpu structure obtained via raw_cpu_ptr()? Would holding the mutex prevent the acomp_ctx of the cpu prior to the migration (in the UAF scenario you described) from being deleted? If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the UAF, I agree, we might need a way to prevent the acomp_ctx from being deleted, e.g. with refcounts as you've suggested, or to not use the acomp_ctx at all for the check, instead use a boolean. Thanks, Kanchana
Hello, On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > >> -----Original Message----- >> From: Johannes Weiner <hannes@cmpxchg.org> >> Sent: Wednesday, November 13, 2024 9:12 PM >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- >> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >> zswap_decompress(). >> >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: >>> So my question was, can we prevent the migration to a different cpu >>> by relinquishing the mutex lock after this conditional >> >> Holding the mutex doesn't prevent preemption/migration. > > Sure, however, is this also applicable to holding the mutex of a per-cpu > structure obtained via raw_cpu_ptr()? Yes, unless you use migration_disable() or cpus_read_lock() to protect this section. > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > the migration (in the UAF scenario you described) from being deleted? No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the > UAF, I agree, we might need a way to prevent the acomp_ctx from being > deleted, e.g. with refcounts as you've suggested, or to not use the Right, refcount solution from Johannes is very good IMHO. > acomp_ctx at all for the check, instead use a boolean. But this is not enough to just avoid using acomp_ctx for the check, the usage of acomp_ctx inside the mutex is also UAF, since cpu offline can kick in anytime to free the acomp_ctx->buffer. Thanks.
Hi Chengming, > -----Original Message----- > From: Chengming Zhou <chengming.zhou@linux.dev> > Sent: Wednesday, November 13, 2024 11:24 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes Weiner > <hannes@cmpxchg.org> > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > Hello, > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > >> -----Original Message----- > >> From: Johannes Weiner <hannes@cmpxchg.org> > >> Sent: Wednesday, November 13, 2024 9:12 PM > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > >> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > >> zswap_decompress(). > >> > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: > >>> So my question was, can we prevent the migration to a different cpu > >>> by relinquishing the mutex lock after this conditional > >> > >> Holding the mutex doesn't prevent preemption/migration. > > > > Sure, however, is this also applicable to holding the mutex of a per-cpu > > structure obtained via raw_cpu_ptr()? > > Yes, unless you use migration_disable() or cpus_read_lock() to protect > this section. Ok. > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > > the migration (in the UAF scenario you described) from being deleted? > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the > > UAF, I agree, we might need a way to prevent the acomp_ctx from being > > deleted, e.g. with refcounts as you've suggested, or to not use the > > Right, refcount solution from Johannes is very good IMHO. > > > acomp_ctx at all for the check, instead use a boolean. > > But this is not enough to just avoid using acomp_ctx for the check, > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline > can kick in anytime to free the acomp_ctx->buffer. I see. How would the refcounts work? Would this add latency to zswap ops? In low memory situations, could the cpu offlining code over-ride the refcounts? Based on Johannes' earlier comments, I don't think it makes sense for me to submit a v2. Thanks, Kanchana > > Thanks.
On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > Hi Chengming, > > > -----Original Message----- > > From: Chengming Zhou <chengming.zhou@linux.dev> > > Sent: Wednesday, November 13, 2024 11:24 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes Weiner > > <hannes@cmpxchg.org> > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > Hello, > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > >> -----Original Message----- > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > >> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com; > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > >> zswap_decompress(). > > >> > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote: > > >>> So my question was, can we prevent the migration to a different cpu > > >>> by relinquishing the mutex lock after this conditional > > >> > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > Sure, however, is this also applicable to holding the mutex of a per-cpu > > > structure obtained via raw_cpu_ptr()? > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect > > this section. > > Ok. > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > > > the migration (in the UAF scenario you described) from being deleted? > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the > > > UAF, I agree, we might need a way to prevent the acomp_ctx from being > > > deleted, e.g. with refcounts as you've suggested, or to not use the > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > But this is not enough to just avoid using acomp_ctx for the check, > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline > > can kick in anytime to free the acomp_ctx->buffer. > > I see. How would the refcounts work? Would this add latency to zswap > ops? In low memory situations, could the cpu offlining code over-ride > the refcounts? I think what Johannes meant is that the zswap compress/decompress paths grab a ref on the acomp_ctx before using it, and the CPU offlining code only drops the initial ref, and does not free the buffer directly. The buffer is only freed when the ref drops to zero. I am not familiar with CPU hotplug, would it be simpler if we have a wrapper like get_acomp_ctx() that disables migration or calls cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar wrapper, put_acompt_ctx() will be used after we are done using the acomp_ctx. > > Based on Johannes' earlier comments, I don't think it makes sense for > me to submit a v2. > > Thanks, > Kanchana > > > > > Thanks.
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Friday, November 15, 2024 1:49 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > Hi Chengming, > > > > > -----Original Message----- > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes > Weiner > > > <hannes@cmpxchg.org> > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; > Huang, > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, > Vinodh > > > <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > Hello, > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > >> -----Original Message----- > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > >> mm@kvack.org; chengming.zhou@linux.dev; > usamaarif642@gmail.com; > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > >> zswap_decompress(). > > > >> > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P > wrote: > > > >>> So my question was, can we prevent the migration to a different cpu > > > >>> by relinquishing the mutex lock after this conditional > > > >> > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > Sure, however, is this also applicable to holding the mutex of a per-cpu > > > > structure obtained via raw_cpu_ptr()? > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect > > > this section. > > > > Ok. > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > > > > the migration (in the UAF scenario you described) from being deleted? > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from being > > > > deleted, e.g. with refcounts as you've suggested, or to not use the > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline > > > can kick in anytime to free the acomp_ctx->buffer. > > > > I see. How would the refcounts work? Would this add latency to zswap > > ops? In low memory situations, could the cpu offlining code over-ride > > the refcounts? > > I think what Johannes meant is that the zswap compress/decompress > paths grab a ref on the acomp_ctx before using it, and the CPU > offlining code only drops the initial ref, and does not free the > buffer directly. The buffer is only freed when the ref drops to zero. > > I am not familiar with CPU hotplug, would it be simpler if we have a > wrapper like get_acomp_ctx() that disables migration or calls > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > wrapper, put_acompt_ctx() will be used after we are done using the > acomp_ctx. Would it be sufficient to add a check for mutex_is_locked() in zswap_cpu_comp_dead() and if this returns true, to exit without deleting the acomp? If this is an acceptable solution, it would also require us to move the mutex_unlock() to occur after the "if (src != acomp_ctx->buffer)" in zswap_decompress(). This would ensure all existing zswap code that's within the mutex_lock()-mutex_unlock() will work correctly without worrying about the acomp_ctx being deleted by cpu offlining. Not sure if this would be a comprehensive solution, or if it would have unintended consequences to the cpu offlining code. Would appreciate comments. Thanks, Kanchana > > > > > Based on Johannes' earlier comments, I don't think it makes sense for > > me to submit a v2. > > > > Thanks, > > Kanchana > > > > > > > > Thanks.
On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Friday, November 15, 2024 1:49 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > Hi Chengming, > > > > > > > -----Original Message----- > > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes > > Weiner > > > > <hannes@cmpxchg.org> > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; > > Huang, > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, > > Vinodh > > > > <vinodh.gopal@intel.com> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > Hello, > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > >> -----Original Message----- > > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > >> zswap_decompress(). > > > > >> > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P > > wrote: > > > > >>> So my question was, can we prevent the migration to a different cpu > > > > >>> by relinquishing the mutex lock after this conditional > > > > >> > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of a per-cpu > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect > > > > this section. > > > > > > Ok. > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > > > > > the migration (in the UAF scenario you described) from being deleted? > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from being > > > > > deleted, e.g. with refcounts as you've suggested, or to not use the > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > I see. How would the refcounts work? Would this add latency to zswap > > > ops? In low memory situations, could the cpu offlining code over-ride > > > the refcounts? > > > > I think what Johannes meant is that the zswap compress/decompress > > paths grab a ref on the acomp_ctx before using it, and the CPU > > offlining code only drops the initial ref, and does not free the > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > I am not familiar with CPU hotplug, would it be simpler if we have a > > wrapper like get_acomp_ctx() that disables migration or calls > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > > wrapper, put_acompt_ctx() will be used after we are done using the > > acomp_ctx. > > Would it be sufficient to add a check for mutex_is_locked() in > zswap_cpu_comp_dead() and if this returns true, to exit without deleting > the acomp? I don't think this works. First of all, it's racy. It's possible the mutex gets locked after we check mutex_is_locked() but before we delete the acomp_ctx. Also, if we find that the mutex is locked, then we do nothing and essentially leak the memory. Second, and probably more important, this only checks if anyone is currently holding the mutex. What about tasks that may be sleeping waiting for the mutex to be unlocked? The mutex will be deleted from under them as well. > If this is an acceptable solution, it would also require us > to move the mutex_unlock() to occur after the "if (src != acomp_ctx->buffer)" > in zswap_decompress(). This would ensure all existing zswap code that's > within the mutex_lock()-mutex_unlock() will work correctly without > worrying about the acomp_ctx being deleted by cpu offlining. > > Not sure if this would be a comprehensive solution, or if it would have > unintended consequences to the cpu offlining code. Would appreciate > comments. > > Thanks, > Kanchana
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 19, 2024 11:27 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Friday, November 15, 2024 1:49 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > Hi Chengming, > > > > > > > > > -----Original Message----- > > > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes > > > Weiner > > > > > <hannes@cmpxchg.org> > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; > > > Huang, > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, > > > Vinodh > > > > > <vinodh.gopal@intel.com> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > zswap_decompress(). > > > > > > > > > > Hello, > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > >> -----Original Message----- > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > > usamaarif642@gmail.com; > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > >> zswap_decompress(). > > > > > >> > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P > > > wrote: > > > > > >>> So my question was, can we prevent the migration to a different > cpu > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > >> > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of a per- > cpu > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect > > > > > this section. > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > > > > > > the migration (in the UAF scenario you described) from being > deleted? > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent > the > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from > being > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use the > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > I see. How would the refcounts work? Would this add latency to zswap > > > > ops? In low memory situations, could the cpu offlining code over-ride > > > > the refcounts? > > > > > > I think what Johannes meant is that the zswap compress/decompress > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > offlining code only drops the initial ref, and does not free the > > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have a > > > wrapper like get_acomp_ctx() that disables migration or calls > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > > > wrapper, put_acompt_ctx() will be used after we are done using the > > > acomp_ctx. > > > > Would it be sufficient to add a check for mutex_is_locked() in > > zswap_cpu_comp_dead() and if this returns true, to exit without deleting > > the acomp? > > I don't think this works. First of all, it's racy. It's possible the > mutex gets locked after we check mutex_is_locked() but before we > delete the acomp_ctx. Also, if we find that the mutex is locked, then > we do nothing and essentially leak the memory. Yes, this would assume the cpu offlining code retries at some interval, which could prevent the memory leak. > > Second, and probably more important, this only checks if anyone is > currently holding the mutex. What about tasks that may be sleeping > waiting for the mutex to be unlocked? The mutex will be deleted from > under them as well. Wouldn't this and the race described above, also be issues for the refcount based approach? Also, I am wondering if the mutex design already handles cases where tasks are sleeping, waiting for a mutex that disappears? Thanks, Kanchana > > > If this is an acceptable solution, it would also require us > > to move the mutex_unlock() to occur after the "if (src != acomp_ctx- > >buffer)" > > in zswap_decompress(). This would ensure all existing zswap code that's > > within the mutex_lock()-mutex_unlock() will work correctly without > > worrying about the acomp_ctx being deleted by cpu offlining. > > > > Not sure if this would be a comprehensive solution, or if it would have > > unintended consequences to the cpu offlining code. Would appreciate > > comments. > > > > Thanks, > > Kanchana
On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Tuesday, November 19, 2024 11:27 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > Sent: Friday, November 15, 2024 1:49 PM > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > Hi Chengming, > > > > > > > > > > > -----Original Message----- > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes > > > > Weiner > > > > > > <hannes@cmpxchg.org> > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; > > > > Huang, > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, > > > > Vinodh > > > > > > <vinodh.gopal@intel.com> > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > zswap_decompress(). > > > > > > > > > > > > Hello, > > > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > > > >> -----Original Message----- > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > > > usamaarif642@gmail.com; > > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > >> zswap_decompress(). > > > > > > >> > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P > > > > wrote: > > > > > > >>> So my question was, can we prevent the migration to a different > > cpu > > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > > >> > > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of a per- > > cpu > > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect > > > > > > this section. > > > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to > > > > > > > the migration (in the UAF scenario you described) from being > > deleted? > > > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent > > the > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from > > being > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use the > > > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline > > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > I see. How would the refcounts work? Would this add latency to zswap > > > > > ops? In low memory situations, could the cpu offlining code over-ride > > > > > the refcounts? > > > > > > > > I think what Johannes meant is that the zswap compress/decompress > > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > > offlining code only drops the initial ref, and does not free the > > > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have a > > > > wrapper like get_acomp_ctx() that disables migration or calls > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > > > > wrapper, put_acompt_ctx() will be used after we are done using the > > > > acomp_ctx. > > > > > > Would it be sufficient to add a check for mutex_is_locked() in > > > zswap_cpu_comp_dead() and if this returns true, to exit without deleting > > > the acomp? > > > > I don't think this works. First of all, it's racy. It's possible the > > mutex gets locked after we check mutex_is_locked() but before we > > delete the acomp_ctx. Also, if we find that the mutex is locked, then > > we do nothing and essentially leak the memory. > > Yes, this would assume the cpu offlining code retries at some interval, > which could prevent the memory leak. I am not sure about that, but even so, it wouldn't handle the first scenario where the mutex gets locked after we check mutex_is_locked(). > > > > > Second, and probably more important, this only checks if anyone is > > currently holding the mutex. What about tasks that may be sleeping > > waiting for the mutex to be unlocked? The mutex will be deleted from > > under them as well. > > Wouldn't this and the race described above, also be issues for the > refcount based approach? I don't think so, at least if implemented correctly. There are a lot of examples around the kernel that use RCU + refcounts for such use cases. I think there are also some examples in kernel docs. That being said, I am wondering if we can get away with something simpler like holding the cpus read lock or disabling migration as I suggested earlier, but I am not quite sure. > > Also, I am wondering if the mutex design already handles cases where > tasks are sleeping, waiting for a mutex that disappears? I don't believe so. It doesn't make sense for someone to free a mutex while someone is waiting for it. How would the waiter know if the memory backing the mutex was freed?
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 19, 2024 11:51 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Tuesday, November 19, 2024 11:27 AM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > > Sent: Friday, November 15, 2024 1:49 PM > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > > > kernel@vger.kernel.org; linux-mm@kvack.org; > usamaarif642@gmail.com; > > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > zswap_decompress(). > > > > > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > Hi Chengming, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; > Johannes > > > > > Weiner > > > > > > > <hannes@cmpxchg.org> > > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > > > > mm@kvack.org; usamaarif642@gmail.com; > ryan.roberts@arm.com; > > > > > Huang, > > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; > Gopal, > > > > > Vinodh > > > > > > > <vinodh.gopal@intel.com> > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > > zswap_decompress(). > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; > linux- > > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > > > > usamaarif642@gmail.com; > > > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, > Wajdi K > > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory > leak in > > > > > > > >> zswap_decompress(). > > > > > > > >> > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana > P > > > > > wrote: > > > > > > > >>> So my question was, can we prevent the migration to a > different > > > cpu > > > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > > > >> > > > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of a > per- > > > cpu > > > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to > protect > > > > > > > this section. > > > > > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior > to > > > > > > > > the migration (in the UAF scenario you described) from being > > > deleted? > > > > > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to > prevent > > > the > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx > from > > > being > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use > the > > > > > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu > offline > > > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > I see. How would the refcounts work? Would this add latency to > zswap > > > > > > ops? In low memory situations, could the cpu offlining code over-ride > > > > > > the refcounts? > > > > > > > > > > I think what Johannes meant is that the zswap compress/decompress > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > > > offlining code only drops the initial ref, and does not free the > > > > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have a > > > > > wrapper like get_acomp_ctx() that disables migration or calls > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > > > > > wrapper, put_acompt_ctx() will be used after we are done using the > > > > > acomp_ctx. > > > > > > > > Would it be sufficient to add a check for mutex_is_locked() in > > > > zswap_cpu_comp_dead() and if this returns true, to exit without > deleting > > > > the acomp? > > > > > > I don't think this works. First of all, it's racy. It's possible the > > > mutex gets locked after we check mutex_is_locked() but before we > > > delete the acomp_ctx. Also, if we find that the mutex is locked, then > > > we do nothing and essentially leak the memory. > > > > Yes, this would assume the cpu offlining code retries at some interval, > > which could prevent the memory leak. > > I am not sure about that, but even so, it wouldn't handle the first > scenario where the mutex gets locked after we check mutex_is_locked(). > > > > > > > > > Second, and probably more important, this only checks if anyone is > > > currently holding the mutex. What about tasks that may be sleeping > > > waiting for the mutex to be unlocked? The mutex will be deleted from > > > under them as well. > > > > Wouldn't this and the race described above, also be issues for the > > refcount based approach? > > I don't think so, at least if implemented correctly. There are a lot > of examples around the kernel that use RCU + refcounts for such use > cases. I think there are also some examples in kernel docs. > > That being said, I am wondering if we can get away with something > simpler like holding the cpus read lock or disabling migration as I > suggested earlier, but I am not quite sure. Another idea to consider is how zsmalloc avoids this issue through its use of the local_lock() on the per-cpu mapping area. This disables preemption from zs_map_object() through zs_unmap_object(). Would changing the acomp_ctx's mutex to a local_lock solve the problem? > > > > > Also, I am wondering if the mutex design already handles cases where > > tasks are sleeping, waiting for a mutex that disappears? > > I don't believe so. It doesn't make sense for someone to free a mutex > while someone is waiting for it. How would the waiter know if the > memory backing the mutex was freed? Thanks Yosry, all good points. There would need to be some sort of arbiter (for e.g., the cpu offlining code) that would reschedule tasks running on a cpu before shutting it down, which could address this specific issue. I was thinking these are not problems unique to zswap's per-cpu acomp_ctx->mutex wrt the offlining? Thanks, Kanchana
On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Tuesday, November 19, 2024 11:51 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > Sent: Tuesday, November 19, 2024 11:27 AM > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > > > Sent: Friday, November 15, 2024 1:49 PM > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > > > > kernel@vger.kernel.org; linux-mm@kvack.org; > > usamaarif642@gmail.com; > > > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > zswap_decompress(). > > > > > > > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > > Hi Chengming, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; > > Johannes > > > > > > Weiner > > > > > > > > <hannes@cmpxchg.org> > > > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- > > > > > > > > mm@kvack.org; usamaarif642@gmail.com; > > ryan.roberts@arm.com; > > > > > > Huang, > > > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; > > Gopal, > > > > > > Vinodh > > > > > > > > <vinodh.gopal@intel.com> > > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > > > zswap_decompress(). > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; > > linux- > > > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > > > > > usamaarif642@gmail.com; > > > > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, > > Wajdi K > > > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > > <vinodh.gopal@intel.com> > > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory > > leak in > > > > > > > > >> zswap_decompress(). > > > > > > > > >> > > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana > > P > > > > > > wrote: > > > > > > > > >>> So my question was, can we prevent the migration to a > > different > > > > cpu > > > > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > > > > >> > > > > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of a > > per- > > > > cpu > > > > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to > > protect > > > > > > > > this section. > > > > > > > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior > > to > > > > > > > > > the migration (in the UAF scenario you described) from being > > > > deleted? > > > > > > > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to > > prevent > > > > the > > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx > > from > > > > being > > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use > > the > > > > > > > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for the check, > > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu > > offline > > > > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > I see. How would the refcounts work? Would this add latency to > > zswap > > > > > > > ops? In low memory situations, could the cpu offlining code over-ride > > > > > > > the refcounts? > > > > > > > > > > > > I think what Johannes meant is that the zswap compress/decompress > > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > > > > offlining code only drops the initial ref, and does not free the > > > > > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > > > > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have a > > > > > > wrapper like get_acomp_ctx() that disables migration or calls > > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar > > > > > > wrapper, put_acompt_ctx() will be used after we are done using the > > > > > > acomp_ctx. > > > > > > > > > > Would it be sufficient to add a check for mutex_is_locked() in > > > > > zswap_cpu_comp_dead() and if this returns true, to exit without > > deleting > > > > > the acomp? > > > > > > > > I don't think this works. First of all, it's racy. It's possible the > > > > mutex gets locked after we check mutex_is_locked() but before we > > > > delete the acomp_ctx. Also, if we find that the mutex is locked, then > > > > we do nothing and essentially leak the memory. > > > > > > Yes, this would assume the cpu offlining code retries at some interval, > > > which could prevent the memory leak. > > > > I am not sure about that, but even so, it wouldn't handle the first > > scenario where the mutex gets locked after we check mutex_is_locked(). > > > > > > > > > > > > > Second, and probably more important, this only checks if anyone is > > > > currently holding the mutex. What about tasks that may be sleeping > > > > waiting for the mutex to be unlocked? The mutex will be deleted from > > > > under them as well. > > > > > > Wouldn't this and the race described above, also be issues for the > > > refcount based approach? > > > > I don't think so, at least if implemented correctly. There are a lot > > of examples around the kernel that use RCU + refcounts for such use > > cases. I think there are also some examples in kernel docs. > > > > That being said, I am wondering if we can get away with something > > simpler like holding the cpus read lock or disabling migration as I > > suggested earlier, but I am not quite sure. > > Another idea to consider is how zsmalloc avoids this issue through > its use of the local_lock() on the per-cpu mapping area. This disables > preemption from zs_map_object() through zs_unmap_object(). > Would changing the acomp_ctx's mutex to a local_lock solve the > problem? This is similar to disabling migration as I suggested, but disabling preemption means that we cannot sleep, we spin on a lock instead. In zswap_compress(), we send the compression request and may sleep waiting for it. We also make a non-atomic allocation later that may also sleep but that's less of a problem. In zswap_decompress() we may also sleep, which is why we sometimes copy the data into acomp_ctx->buffer and unmap the handle to begin with. So I don't think we can just replace the mutex with a lock. > > > > > > > > > Also, I am wondering if the mutex design already handles cases where > > > tasks are sleeping, waiting for a mutex that disappears? > > > > I don't believe so. It doesn't make sense for someone to free a mutex > > while someone is waiting for it. How would the waiter know if the > > memory backing the mutex was freed? > > Thanks Yosry, all good points. There would need to be some sort of > arbiter (for e.g., the cpu offlining code) that would reschedule tasks > running on a cpu before shutting it down, which could address > this specific issue. I was thinking these are not problems unique to > zswap's per-cpu acomp_ctx->mutex wrt the offlining? There are a few reasons why zswap has this problem and other code may not have it. For example the data structure is dynamically allocated and is freed during offlining, it wouldn't be a problem if it was static. Also the fact that we don't disable preemption when accessing the per-CPU data, as I mentioned earlier, which would prevent the CPU we are running on from going offline while we access the per-CPU data. I think we should either: (a) Use refcounts. (b) Disable migration. (c) Hold the CPUs read lock. I was hoping someone with more knowledge about CPU offlining would confirm (b) and (c) would work, but I am pretty confident they would.
On 2024/11/20 07:44, Yosry Ahmed wrote: > On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: >> >> >>> -----Original Message----- >>> From: Yosry Ahmed <yosryahmed@google.com> >>> Sent: Tuesday, November 19, 2024 11:51 AM >>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >>> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner >>> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- >>> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; >>> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; >>> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh >>> <vinodh.gopal@intel.com> >>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>> zswap_decompress(). >>> >>> On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P >>> <kanchana.p.sridhar@intel.com> wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Yosry Ahmed <yosryahmed@google.com> >>>>> Sent: Tuesday, November 19, 2024 11:27 AM >>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >>>>> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner >>>>> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- >>>>> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; >>>>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; >>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K >>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> >>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>>>> zswap_decompress(). >>>>> >>>>> On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P >>>>> <kanchana.p.sridhar@intel.com> wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Yosry Ahmed <yosryahmed@google.com> >>>>>>> Sent: Friday, November 15, 2024 1:49 PM >>>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >>>>>>> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner >>>>>>> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- >>>>>>> kernel@vger.kernel.org; linux-mm@kvack.org; >>> usamaarif642@gmail.com; >>>>>>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; >>>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K >>>>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh >>> <vinodh.gopal@intel.com> >>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>>>>>> zswap_decompress(). >>>>>>> >>>>>>> On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P >>>>>>> <kanchana.p.sridhar@intel.com> wrote: >>>>>>>> >>>>>>>> Hi Chengming, >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Chengming Zhou <chengming.zhou@linux.dev> >>>>>>>>> Sent: Wednesday, November 13, 2024 11:24 PM >>>>>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; >>> Johannes >>>>>>> Weiner >>>>>>>>> <hannes@cmpxchg.org> >>>>>>>>> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed >>>>>>>>> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux- >>>>>>>>> mm@kvack.org; usamaarif642@gmail.com; >>> ryan.roberts@arm.com; >>>>>>> Huang, >>>>>>>>> Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- >>>>>>>>> foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; >>> Gopal, >>>>>>> Vinodh >>>>>>>>> <vinodh.gopal@intel.com> >>>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in >>>>>>>>> zswap_decompress(). >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> On 2024/11/14 14:37, Sridhar, Kanchana P wrote: >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Johannes Weiner <hannes@cmpxchg.org> >>>>>>>>>>> Sent: Wednesday, November 13, 2024 9:12 PM >>>>>>>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> >>>>>>>>>>> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed >>>>>>>>>>> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; >>> linux- >>>>>>>>>>> mm@kvack.org; chengming.zhou@linux.dev; >>>>>>> usamaarif642@gmail.com; >>>>>>>>>>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; >>>>>>>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, >>> Wajdi K >>>>>>>>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh >>>>> <vinodh.gopal@intel.com> >>>>>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory >>> leak in >>>>>>>>>>> zswap_decompress(). >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana >>> P >>>>>>> wrote: >>>>>>>>>>>> So my question was, can we prevent the migration to a >>> different >>>>> cpu >>>>>>>>>>>> by relinquishing the mutex lock after this conditional >>>>>>>>>>> >>>>>>>>>>> Holding the mutex doesn't prevent preemption/migration. >>>>>>>>>> >>>>>>>>>> Sure, however, is this also applicable to holding the mutex of a >>> per- >>>>> cpu >>>>>>>>>> structure obtained via raw_cpu_ptr()? >>>>>>>>> >>>>>>>>> Yes, unless you use migration_disable() or cpus_read_lock() to >>> protect >>>>>>>>> this section. >>>>>>>> >>>>>>>> Ok. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Would holding the mutex prevent the acomp_ctx of the cpu prior >>> to >>>>>>>>>> the migration (in the UAF scenario you described) from being >>>>> deleted? >>>>>>>>> >>>>>>>>> No, cpu offline can kick in anytime to free the acomp_ctx->buffer. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> If holding the per-cpu acomp_ctx's mutex isn't sufficient to >>> prevent >>>>> the >>>>>>>>>> UAF, I agree, we might need a way to prevent the acomp_ctx >>> from >>>>> being >>>>>>>>>> deleted, e.g. with refcounts as you've suggested, or to not use >>> the >>>>>>>>> >>>>>>>>> Right, refcount solution from Johannes is very good IMHO. >>>>>>>>> >>>>>>>>>> acomp_ctx at all for the check, instead use a boolean. >>>>>>>>> >>>>>>>>> But this is not enough to just avoid using acomp_ctx for the check, >>>>>>>>> the usage of acomp_ctx inside the mutex is also UAF, since cpu >>> offline >>>>>>>>> can kick in anytime to free the acomp_ctx->buffer. >>>>>>>> >>>>>>>> I see. How would the refcounts work? Would this add latency to >>> zswap >>>>>>>> ops? In low memory situations, could the cpu offlining code over-ride >>>>>>>> the refcounts? >>>>>>> >>>>>>> I think what Johannes meant is that the zswap compress/decompress >>>>>>> paths grab a ref on the acomp_ctx before using it, and the CPU >>>>>>> offlining code only drops the initial ref, and does not free the >>>>>>> buffer directly. The buffer is only freed when the ref drops to zero. >>>>>>> >>>>>>> I am not familiar with CPU hotplug, would it be simpler if we have a >>>>>>> wrapper like get_acomp_ctx() that disables migration or calls >>>>>>> cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar >>>>>>> wrapper, put_acompt_ctx() will be used after we are done using the >>>>>>> acomp_ctx. >>>>>> >>>>>> Would it be sufficient to add a check for mutex_is_locked() in >>>>>> zswap_cpu_comp_dead() and if this returns true, to exit without >>> deleting >>>>>> the acomp? >>>>> >>>>> I don't think this works. First of all, it's racy. It's possible the >>>>> mutex gets locked after we check mutex_is_locked() but before we >>>>> delete the acomp_ctx. Also, if we find that the mutex is locked, then >>>>> we do nothing and essentially leak the memory. >>>> >>>> Yes, this would assume the cpu offlining code retries at some interval, >>>> which could prevent the memory leak. >>> >>> I am not sure about that, but even so, it wouldn't handle the first >>> scenario where the mutex gets locked after we check mutex_is_locked(). >>> >>>> >>>>> >>>>> Second, and probably more important, this only checks if anyone is >>>>> currently holding the mutex. What about tasks that may be sleeping >>>>> waiting for the mutex to be unlocked? The mutex will be deleted from >>>>> under them as well. >>>> >>>> Wouldn't this and the race described above, also be issues for the >>>> refcount based approach? >>> >>> I don't think so, at least if implemented correctly. There are a lot >>> of examples around the kernel that use RCU + refcounts for such use >>> cases. I think there are also some examples in kernel docs. >>> >>> That being said, I am wondering if we can get away with something >>> simpler like holding the cpus read lock or disabling migration as I >>> suggested earlier, but I am not quite sure. >> >> Another idea to consider is how zsmalloc avoids this issue through >> its use of the local_lock() on the per-cpu mapping area. This disables >> preemption from zs_map_object() through zs_unmap_object(). >> Would changing the acomp_ctx's mutex to a local_lock solve the >> problem? > > This is similar to disabling migration as I suggested, but disabling > preemption means that we cannot sleep, we spin on a lock instead. > > In zswap_compress(), we send the compression request and may sleep > waiting for it. We also make a non-atomic allocation later that may > also sleep but that's less of a problem. > > In zswap_decompress() we may also sleep, which is why we sometimes > copy the data into acomp_ctx->buffer and unmap the handle to begin > with. > > So I don't think we can just replace the mutex with a lock. > >> >>> >>>> >>>> Also, I am wondering if the mutex design already handles cases where >>>> tasks are sleeping, waiting for a mutex that disappears? >>> >>> I don't believe so. It doesn't make sense for someone to free a mutex >>> while someone is waiting for it. How would the waiter know if the >>> memory backing the mutex was freed? >> >> Thanks Yosry, all good points. There would need to be some sort of >> arbiter (for e.g., the cpu offlining code) that would reschedule tasks >> running on a cpu before shutting it down, which could address >> this specific issue. I was thinking these are not problems unique to >> zswap's per-cpu acomp_ctx->mutex wrt the offlining? > > There are a few reasons why zswap has this problem and other code may > not have it. For example the data structure is dynamically allocated > and is freed during offlining, it wouldn't be a problem if it was > static. Also the fact that we don't disable preemption when accessing > the per-CPU data, as I mentioned earlier, which would prevent the CPU > we are running on from going offline while we access the per-CPU data. > > I think we should either: > (a) Use refcounts. > (b) Disable migration. > (c) Hold the CPUs read lock. > > I was hoping someone with more knowledge about CPU offlining would > confirm (b) and (c) would work, but I am pretty confident they would. +1, I also think (b) or (c) should work with my limited knowledge about CPU offlining :-) And they seem simpler than refcounts implementation. Thanks.
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, November 19, 2024 3:45 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Tuesday, November 19, 2024 11:51 AM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com; > > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > > Sent: Tuesday, November 19, 2024 11:27 AM > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner > > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux- > > > > > kernel@vger.kernel.org; linux-mm@kvack.org; > usamaarif642@gmail.com; > > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K > > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > zswap_decompress(). > > > > > > > > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P > > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > > > > Sent: Friday, November 15, 2024 1:49 PM > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes > Weiner > > > > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; > linux- > > > > > > > kernel@vger.kernel.org; linux-mm@kvack.org; > > > usamaarif642@gmail.com; > > > > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > > > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi > K > > > > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > > > zswap_decompress(). > > > > > > > > > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P > > > > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > > > > Hi Chengming, > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev> > > > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM > > > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; > > > Johannes > > > > > > > Weiner > > > > > > > > > <hannes@cmpxchg.org> > > > > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; > linux- > > > > > > > > > mm@kvack.org; usamaarif642@gmail.com; > > > ryan.roberts@arm.com; > > > > > > > Huang, > > > > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; > akpm@linux- > > > > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; > > > Gopal, > > > > > > > Vinodh > > > > > > > > > <vinodh.gopal@intel.com> > > > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory > leak in > > > > > > > > > zswap_decompress(). > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote: > > > > > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org> > > > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM > > > > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed > > > > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; > > > linux- > > > > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev; > > > > > > > usamaarif642@gmail.com; > > > > > > > > > >> ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; > > > > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, > > > Wajdi K > > > > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > > > <vinodh.gopal@intel.com> > > > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory > > > leak in > > > > > > > > > >> zswap_decompress(). > > > > > > > > > >> > > > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, > Kanchana > > > P > > > > > > > wrote: > > > > > > > > > >>> So my question was, can we prevent the migration to a > > > different > > > > > cpu > > > > > > > > > >>> by relinquishing the mutex lock after this conditional > > > > > > > > > >> > > > > > > > > > >> Holding the mutex doesn't prevent preemption/migration. > > > > > > > > > > > > > > > > > > > > Sure, however, is this also applicable to holding the mutex of > a > > > per- > > > > > cpu > > > > > > > > > > structure obtained via raw_cpu_ptr()? > > > > > > > > > > > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to > > > protect > > > > > > > > > this section. > > > > > > > > > > > > > > > > Ok. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu > prior > > > to > > > > > > > > > > the migration (in the UAF scenario you described) from being > > > > > deleted? > > > > > > > > > > > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx- > >buffer. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to > > > prevent > > > > > the > > > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx > > > from > > > > > being > > > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not > use > > > the > > > > > > > > > > > > > > > > > > Right, refcount solution from Johannes is very good IMHO. > > > > > > > > > > > > > > > > > > > acomp_ctx at all for the check, instead use a boolean. > > > > > > > > > > > > > > > > > > But this is not enough to just avoid using acomp_ctx for the > check, > > > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu > > > offline > > > > > > > > > can kick in anytime to free the acomp_ctx->buffer. > > > > > > > > > > > > > > > > I see. How would the refcounts work? Would this add latency to > > > zswap > > > > > > > > ops? In low memory situations, could the cpu offlining code over- > ride > > > > > > > > the refcounts? > > > > > > > > > > > > > > I think what Johannes meant is that the zswap > compress/decompress > > > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU > > > > > > > offlining code only drops the initial ref, and does not free the > > > > > > > buffer directly. The buffer is only freed when the ref drops to zero. > > > > > > > > > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have > a > > > > > > > wrapper like get_acomp_ctx() that disables migration or calls > > > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A > similar > > > > > > > wrapper, put_acompt_ctx() will be used after we are done using > the > > > > > > > acomp_ctx. > > > > > > > > > > > > Would it be sufficient to add a check for mutex_is_locked() in > > > > > > zswap_cpu_comp_dead() and if this returns true, to exit without > > > deleting > > > > > > the acomp? > > > > > > > > > > I don't think this works. First of all, it's racy. It's possible the > > > > > mutex gets locked after we check mutex_is_locked() but before we > > > > > delete the acomp_ctx. Also, if we find that the mutex is locked, then > > > > > we do nothing and essentially leak the memory. > > > > > > > > Yes, this would assume the cpu offlining code retries at some interval, > > > > which could prevent the memory leak. > > > > > > I am not sure about that, but even so, it wouldn't handle the first > > > scenario where the mutex gets locked after we check mutex_is_locked(). > > > > > > > > > > > > > > > > > Second, and probably more important, this only checks if anyone is > > > > > currently holding the mutex. What about tasks that may be sleeping > > > > > waiting for the mutex to be unlocked? The mutex will be deleted from > > > > > under them as well. > > > > > > > > Wouldn't this and the race described above, also be issues for the > > > > refcount based approach? > > > > > > I don't think so, at least if implemented correctly. There are a lot > > > of examples around the kernel that use RCU + refcounts for such use > > > cases. I think there are also some examples in kernel docs. > > > > > > That being said, I am wondering if we can get away with something > > > simpler like holding the cpus read lock or disabling migration as I > > > suggested earlier, but I am not quite sure. > > > > Another idea to consider is how zsmalloc avoids this issue through > > its use of the local_lock() on the per-cpu mapping area. This disables > > preemption from zs_map_object() through zs_unmap_object(). > > Would changing the acomp_ctx's mutex to a local_lock solve the > > problem? > > This is similar to disabling migration as I suggested, but disabling > preemption means that we cannot sleep, we spin on a lock instead. > > In zswap_compress(), we send the compression request and may sleep > waiting for it. We also make a non-atomic allocation later that may > also sleep but that's less of a problem. > > In zswap_decompress() we may also sleep, which is why we sometimes > copy the data into acomp_ctx->buffer and unmap the handle to begin > with. > > So I don't think we can just replace the mutex with a lock. Thanks Yosry, for the explanations. Yes, I understand and agree. > > > > > > > > > > > > > > Also, I am wondering if the mutex design already handles cases where > > > > tasks are sleeping, waiting for a mutex that disappears? > > > > > > I don't believe so. It doesn't make sense for someone to free a mutex > > > while someone is waiting for it. How would the waiter know if the > > > memory backing the mutex was freed? > > > > Thanks Yosry, all good points. There would need to be some sort of > > arbiter (for e.g., the cpu offlining code) that would reschedule tasks > > running on a cpu before shutting it down, which could address > > this specific issue. I was thinking these are not problems unique to > > zswap's per-cpu acomp_ctx->mutex wrt the offlining? > > There are a few reasons why zswap has this problem and other code may > not have it. For example the data structure is dynamically allocated > and is freed during offlining, it wouldn't be a problem if it was > static. Also the fact that we don't disable preemption when accessing > the per-CPU data, as I mentioned earlier, which would prevent the CPU > we are running on from going offline while we access the per-CPU data. > > I think we should either: > (a) Use refcounts. > (b) Disable migration. > (c) Hold the CPUs read lock. > > I was hoping someone with more knowledge about CPU offlining would > confirm (b) and (c) would work, but I am pretty confident they would. Ok. Thanks again for the explanations. Thanks, Kanchana
On Wed, Nov 13, 2024 at 1:30 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote: > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > The code is fine as is. > > Like you said, acomp_ctx->buffer (the pointer) doesn't change. It > points to whatever was kmalloced in zswap_cpu_comp_prepare(). The > handle points to backend memory. Neither of those addresses can change > under us. There is no confusing them, and they cannot coincide. > > The mutex guards the *memory* behind the buffer, so that we don't have > multiple (de)compressors stepping on each others' toes. But it's fine > to drop the mutex once we're done working with the memory. We don't > need the mutex to check whether src holds the acomp buffer address. I have to admit that I confused myself with this alleged bug more than I like to admit :) I initially thought acomp_ctx->buffer can be changed, then when I realized it cannot be changed I did not tie that back to the 'fix' not being needed at all. I need more coffee. > > That being said, I do think there is a UAF bug in CPU hotplugging. > > There is an acomp_ctx for each cpu, but note that this is best effort > parallelism, not a guarantee that we always have the context of the > local CPU. Look closely: we pick the "local" CPU with preemption > enabled, then contend for the mutex. This may well put us to sleep and > get us migrated, so we could be using the context of a CPU we are no > longer running on. This is fine because we hold the mutex - if that > other CPU tries to use the acomp_ctx, it'll wait for us. > > However, if we get migrated and vacate the CPU whose context we have > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free > the context underneath us. I think we need to refcount the acomp_ctx.
On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > Hi Yosry, > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@google.com> > > Sent: Tuesday, November 12, 2024 10:22 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > zswap_decompress(). > > > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > > hannes@cmpxchg.org; nphamcs@gmail.com; > > chengming.zhou@linux.dev; > > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > foundation.org; > > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > > <vinodh.gopal@intel.com> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > zswap_decompress(). > > > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > > the existing zswap_decompress(): > > > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > Releasing the lock before the conditional does not protect the integrity > > of > > > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a > > > > > risk for the conditional behaving as intended, and consequently not > > > > > unmapping the zpool handle, which could cause a zswap zpool memory > > > > leak. > > > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional and > > > > > subsequent zpool_unmap_handle(). This ensures that the value of "src" > > > > > obtained earlier, with the mutex locked, does not change. > > > > > > > > The commit log is too complicated and incorrect. It is talking about > > > > the stability of 'src', but that's a local variable on the stack > > > > anyway. It doesn't need protection. > > > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after the > > > > mutex is released. Leading to the check not being reliable. Please > > > > simplify this. > > > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > > value changing after the lock is released. This could happen as a result of > > any > > > other compress or decompress that acquires the lock. I will simplify and > > > clarify accordingly. > > > > > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > > like a > > > > > cleaner implementation. > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > > --- > > > > > mm/zswap.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > > zswap_entry *entry, struct folio *folio) > > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > > entry- > > > > >length, PAGE_SIZE); > > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > > >req), &acomp_ctx->wait)); > > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > Actually now that I think more about it, I think this check isn't > > > > entirely safe, even under the lock. Is it possible that > > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > > decompression at the same handle? In this case, we will also > > > > mistakenly skip the unmap. > > > > > > If we move the mutex_unlock to happen after the conditional and unmap, > > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > > take on only 2 values in this procedure: the mapped handle or > > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > > > Yes, that's the scenario I mean. > > > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > > happens to be equal to the same handle from a previous operation as we > > don't clear it. > > Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), > we only copy to it. Duh, yes. I confused myself, sorry for the noise. > > > > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > > though it should be true. This will result in an extra > > zpool_unmap_handle() call. I didn't look closely, but this seems like > > it can have a worse effect than leaking memory (e.g. there will be an > > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > > a random handle). > > > > > > > > Please let me know if I am missing anything. > > > > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > > acomp_ctx->buffer and do the unmap, and check that flag here to check > > > > if the unmap was done or not. > > > > > > Sure, this could be done, but not sure if it is required. Please let me know > > > if we still need the boolean variable in addition to moving the > > mutex_unlock(). > > > > If we use a boolean, there is no need to move mutex_unlock(). The > > boolean will be a local variable on the stack that doesn't need > > protection. > > I agree, using the boolean variable to do the unmap rather than the check > for (src != acomp_ctx->buffer) is more fail-safe. > > I am still thinking moving the mutex_unlock() could help, or at least have > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > safeguards the interaction between the decompress operation, the > sg_*() API calls inside zswap_decompress() and the shared zpool. > > If we release the per-cpu acomp_ctx's mutex lock before the > zpool_unmap_handle(), is it possible that another cpu could acquire > it's acomp_ctx's lock and map the same zpool handle (that the earlier > cpu has yet to unmap or is concurrently unmapping) for a write? > If this could happen, would it result in undefined state for both > these zpool ops on different cpu's? Why would this result in an undefined state? For zsmalloc, mapping uses a per-CPU buffer and preemption is disabled between mapping and unmapping anyway. If another CPU maps the object it should be fine. > > Would keeping the per-cpu mutex locked through the > zpool_unmap_handle() create a non-preemptible state that would > avoid this? IOW, if the above scenario is possible, does the per-cpu > acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow > prevented by the implementation of the zpools? At least for zsmalloc, I think it is. > > Just thought I would bring up these open questions. Please do share > your thoughts and advise. I think moving the mutex unlock after the unmap won't make much of a difference from a performance side, at least for zsmalloc. Preemption will be disabled until the unmap is done anyway, so even after we release the per-CPU mutex it cannot be acquired by anyone else until the unmap is done. Anyway, I think the fix you have right now is fine, if you prefer not adding a boolean. If you do add a boolean, whether you move the mutex unlock or not should not make a difference. Please just rewrite the commit log and CC stable (in the commit log, not in the email CC list). Thanks and sorry for the confusion! > > Thanks, > Kanchana > > > > > > > > > Thanks, > > > Kanchana > > > > > > > > > > > > + > > > > > + mutex_unlock(&acomp_ctx->mutex); > > > > > } > > > > > > > > > > /********************************* > > > > > > > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c > > > > > -- > > > > > 2.27.0 > > > > >
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Wednesday, November 13, 2024 12:11 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > Hi Yosry, > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@google.com> > > > Sent: Tuesday, November 12, 2024 10:22 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > hannes@cmpxchg.org; nphamcs@gmail.com; > chengming.zhou@linux.dev; > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > zswap_decompress(). > > > > > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yosry Ahmed <yosryahmed@google.com> > > > > > Sent: Tuesday, November 12, 2024 9:35 PM > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > > > hannes@cmpxchg.org; nphamcs@gmail.com; > > > chengming.zhou@linux.dev; > > > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > > > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > > > foundation.org; > > > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > > > > > <vinodh.gopal@intel.com> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > > > > > zswap_decompress(). > > > > > > > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar > > > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > > > > > This is a hotfix for a potential zpool memory leak that could result in > > > > > > the existing zswap_decompress(): > > > > > > > > > > > > mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > > > Releasing the lock before the conditional does not protect the > integrity > > > of > > > > > > "src", which is set earlier under the acomp_ctx mutex lock. This > poses a > > > > > > risk for the conditional behaving as intended, and consequently not > > > > > > unmapping the zpool handle, which could cause a zswap zpool > memory > > > > > leak. > > > > > > > > > > > > This patch moves the mutex_unlock() to occur after the conditional > and > > > > > > subsequent zpool_unmap_handle(). This ensures that the value of > "src" > > > > > > obtained earlier, with the mutex locked, does not change. > > > > > > > > > > The commit log is too complicated and incorrect. It is talking about > > > > > the stability of 'src', but that's a local variable on the stack > > > > > anyway. It doesn't need protection. > > > > > > > > > > The problem is 'acomp_ctx->buffer' being reused and changed after > the > > > > > mutex is released. Leading to the check not being reliable. Please > > > > > simplify this. > > > > > > > > Thanks Yosry. That's exactly what I meant, but I think the wording got > > > > confusing. The problem I was trying to fix is the acomp_ctx->buffer > > > > value changing after the lock is released. This could happen as a result > of > > > any > > > > other compress or decompress that acquires the lock. I will simplify and > > > > clarify accordingly. > > > > > > > > > > > > > > > > > > > > > Even though an actual memory leak was not observed, this fix seems > > > like a > > > > > > cleaner implementation. > > > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one") > > > > > > --- > > > > > > mm/zswap.c | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > > index f6316b66fb23..58810fa8ff23 100644 > > > > > > --- a/mm/zswap.c > > > > > > +++ b/mm/zswap.c > > > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct > > > > > zswap_entry *entry, struct folio *folio) > > > > > > acomp_request_set_params(acomp_ctx->req, &input, &output, > > > entry- > > > > > >length, PAGE_SIZE); > > > > > > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > > > > >req), &acomp_ctx->wait)); > > > > > > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > > > > > - mutex_unlock(&acomp_ctx->mutex); > > > > > > > > > > > > if (src != acomp_ctx->buffer) > > > > > > zpool_unmap_handle(zpool, entry->handle); > > > > > > > > > > Actually now that I think more about it, I think this check isn't > > > > > entirely safe, even under the lock. Is it possible that > > > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous > > > > > decompression at the same handle? In this case, we will also > > > > > mistakenly skip the unmap. > > > > > > > > If we move the mutex_unlock to happen after the conditional and > unmap, > > > > shouldn't that be sufficient under all conditions? With the fix, "src" can > > > > take on only 2 values in this procedure: the mapped handle or > > > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case > > > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer. > > > > > > Yes, that's the scenario I mean. > > > > > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer' > > > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer' > > > happens to be equal to the same handle from a previous operation as we > > > don't clear it. > > > > Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(), > > we only copy to it. > > Duh, yes. I confused myself, sorry for the noise. > > > > > > > > > In this case, the 'src != acomp_ctx->buffer' check will be false, even > > > though it should be true. This will result in an extra > > > zpool_unmap_handle() call. I didn't look closely, but this seems like > > > it can have a worse effect than leaking memory (e.g. there will be an > > > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting > > > a random handle). > > > > > > > > > > > Please let me know if I am missing anything. > > > > > > > > > > > > > > It would be more reliable to set a boolean variable if we copy to > > > > > acomp_ctx->buffer and do the unmap, and check that flag here to > check > > > > > if the unmap was done or not. > > > > > > > > Sure, this could be done, but not sure if it is required. Please let me know > > > > if we still need the boolean variable in addition to moving the > > > mutex_unlock(). > > > > > > If we use a boolean, there is no need to move mutex_unlock(). The > > > boolean will be a local variable on the stack that doesn't need > > > protection. > > > > I agree, using the boolean variable to do the unmap rather than the check > > for (src != acomp_ctx->buffer) is more fail-safe. > > > > I am still thinking moving the mutex_unlock() could help, or at least have > > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock > > safeguards the interaction between the decompress operation, the > > sg_*() API calls inside zswap_decompress() and the shared zpool. > > > > If we release the per-cpu acomp_ctx's mutex lock before the > > zpool_unmap_handle(), is it possible that another cpu could acquire > > it's acomp_ctx's lock and map the same zpool handle (that the earlier > > cpu has yet to unmap or is concurrently unmapping) for a write? > > If this could happen, would it result in undefined state for both > > these zpool ops on different cpu's? > > Why would this result in an undefined state? For zsmalloc, mapping > uses a per-CPU buffer and preemption is disabled between mapping and > unmapping anyway. If another CPU maps the object it should be fine. > > > > > Would keeping the per-cpu mutex locked through the > > zpool_unmap_handle() create a non-preemptible state that would > > avoid this? IOW, if the above scenario is possible, does the per-cpu > > acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow > > prevented by the implementation of the zpools? > > At least for zsmalloc, I think it is. > > > > > Just thought I would bring up these open questions. Please do share > > your thoughts and advise. > > I think moving the mutex unlock after the unmap won't make much of a > difference from a performance side, at least for zsmalloc. Preemption > will be disabled until the unmap is done anyway, so even after we > release the per-CPU mutex it cannot be acquired by anyone else until > the unmap is done. > > Anyway, I think the fix you have right now is fine, if you prefer not > adding a boolean. If you do add a boolean, whether you move the mutex > unlock or not should not make a difference. Thanks for the guidance on next steps! I will add the boolean and move the mutex. > > Please just rewrite the commit log and CC stable (in the commit log, > not in the email CC list). I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log (as found from the latest mainline git log) ? Thanks again, Kanchana
[..] > > > > Please just rewrite the commit log and CC stable (in the commit log, > > not in the email CC list). > > I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log > (as found from the latest mainline git log) ? Yes, thanks! > > Thanks again, > Kanchana >
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Wednesday, November 13, 2024 1:00 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in > zswap_decompress(). > > [..] > > > > > > Please just rewrite the commit log and CC stable (in the commit log, > > > not in the email CC list). > > > > I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log > > (as found from the latest mainline git log) ? > > Yes, thanks! Sounds good, thanks Yosry! > > > > > Thanks again, > > Kanchana > >
© 2016 - 2024 Red Hat, Inc.