[PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.

Kanchana P Sridhar posted 2 patches 1 year, 2 months ago
[PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Kanchana P Sridhar 1 year, 2 months ago
In order to set up zswap_store_pages() to enable a clean batching
implementation in [1], this patch implements the following changes:

1) Addition of zswap_alloc_entries() which will allocate zswap entries for
   all pages in the specified range for the folio, upfront. If this fails,
   we return an error status to zswap_store().

2) Addition of zswap_compress_pages() that calls zswap_compress() for each
   page, and returns false if any zswap_compress() fails, so
   zswap_store_page() can cleanup resources allocated and return an error
   status to zswap_store().

3) A "store_pages_failed" label that is a catch-all for all failure points
   in zswap_store_pages(). This facilitates cleaner error handling within
   zswap_store_pages(), which will become important for IAA compress
   batching in [1].

[1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 71 insertions(+), 22 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index b09d1023e775..db80c66e2205 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct *w)
 * main API
 **********************************/
 
+static bool zswap_compress_pages(struct page *pages[],
+				 struct zswap_entry *entries[],
+				 u8 nr_pages,
+				 struct zswap_pool *pool)
+{
+	u8 i;
+
+	for (i = 0; i < nr_pages; ++i) {
+		if (!zswap_compress(pages[i], entries[i], pool))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Allocate @nr zswap entries for storing @nr pages in a folio.
+ * If any one of the entry allocation fails, delete all entries allocated
+ * thus far, and return false.
+ * If @nr entries are successfully allocated, set each entry's "handle"
+ * to "ERR_PTR(-EINVAL)" to denote that the handle has not yet been allocated.
+ */
+static bool zswap_alloc_entries(struct zswap_entry *entries[], int node_id, u8 nr)
+{
+	u8 i;
+
+	for (i = 0; i < nr; ++i) {
+		entries[i] = zswap_entry_cache_alloc(GFP_KERNEL, node_id);
+		if (!entries[i]) {
+			u8 j;
+
+			zswap_reject_kmemcache_fail++;
+			for (j = 0; j < i; ++j)
+				zswap_entry_cache_free(entries[j]);
+			return false;
+		}
+
+		entries[i]->handle = (unsigned long)ERR_PTR(-EINVAL);
+	}
+
+	return true;
+}
+
 /*
  * Store multiple pages in @folio, starting from the page at index @si up to
  * and including the page at index @ei.
+ * The error handling from all failure points is handled by the
+ * "store_pages_failed" label, based on the initial ERR_PTR(-EINVAL) value for
+ * the zswap_entry's handle set by zswap_alloc_entries(), and the fact that the
+ * entry's handle is subsequently modified only upon a successful zpool_malloc().
  */
 static ssize_t zswap_store_pages(struct folio *folio,
 				 long si,
@@ -1419,26 +1466,25 @@ static ssize_t zswap_store_pages(struct folio *folio,
 				 struct obj_cgroup *objcg,
 				 struct zswap_pool *pool)
 {
-	struct page *page;
-	swp_entry_t page_swpentry;
-	struct zswap_entry *entry, *old;
+	struct zswap_entry *entries[SWAP_CRYPTO_BATCH_SIZE], *old;
+	struct page *pages[SWAP_CRYPTO_BATCH_SIZE];
 	size_t compressed_bytes = 0;
 	u8 nr_pages = ei - si + 1;
 	u8 i;
 
-	for (i = 0; i < nr_pages; ++i) {
-		page = folio_page(folio, si + i);
-		page_swpentry = page_swap_entry(page);
+	/* allocate entries */
+	if (!zswap_alloc_entries(entries, folio_nid(folio), nr_pages))
+		return -EINVAL;
 
-		/* allocate entry */
-		entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
-		if (!entry) {
-			zswap_reject_kmemcache_fail++;
-			return -EINVAL;
-		}
+	for (i = 0; i < nr_pages; ++i)
+		pages[i] = folio_page(folio, si + i);
 
-		if (!zswap_compress(page, entry, pool))
-			goto compress_failed;
+	if (!zswap_compress_pages(pages, entries, nr_pages, pool))
+		goto store_pages_failed;
+
+	for (i = 0; i < nr_pages; ++i) {
+		swp_entry_t page_swpentry = page_swap_entry(pages[i]);
+		struct zswap_entry *entry = entries[i];
 
 		old = xa_store(swap_zswap_tree(page_swpentry),
 			       swp_offset(page_swpentry),
@@ -1448,7 +1494,7 @@ static ssize_t zswap_store_pages(struct folio *folio,
 
 			WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
 			zswap_reject_alloc_fail++;
-			goto store_failed;
+			goto store_pages_failed;
 		}
 
 		/*
@@ -1489,16 +1535,19 @@ static ssize_t zswap_store_pages(struct folio *folio,
 		}
 
 		compressed_bytes += entry->length;
-		continue;
-
-store_failed:
-		zpool_free(pool->zpool, entry->handle);
-compress_failed:
-		zswap_entry_cache_free(entry);
-		return -EINVAL;
 	}
 
 	return compressed_bytes;
+
+store_pages_failed:
+	for (i = 0; i < nr_pages; ++i) {
+		if (!IS_ERR_VALUE(entries[i]->handle))
+			zpool_free(pool->zpool, entries[i]->handle);
+
+		zswap_entry_cache_free(entries[i]);
+	}
+
+	return -EINVAL;
 }
 
 bool zswap_store(struct folio *folio)
-- 
2.27.0
Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Chengming Zhou 1 year, 2 months ago
On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> In order to set up zswap_store_pages() to enable a clean batching
> implementation in [1], this patch implements the following changes:
> 
> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
>     all pages in the specified range for the folio, upfront. If this fails,
>     we return an error status to zswap_store().
> 
> 2) Addition of zswap_compress_pages() that calls zswap_compress() for each
>     page, and returns false if any zswap_compress() fails, so
>     zswap_store_page() can cleanup resources allocated and return an error
>     status to zswap_store().
> 
> 3) A "store_pages_failed" label that is a catch-all for all failure points
>     in zswap_store_pages(). This facilitates cleaner error handling within
>     zswap_store_pages(), which will become important for IAA compress
>     batching in [1].
> 
> [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
> 
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>   mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 71 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index b09d1023e775..db80c66e2205 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct *w)
>   * main API
>   **********************************/
>   
> +static bool zswap_compress_pages(struct page *pages[],
> +				 struct zswap_entry *entries[],
> +				 u8 nr_pages,
> +				 struct zswap_pool *pool)
> +{
> +	u8 i;
> +
> +	for (i = 0; i < nr_pages; ++i) {
> +		if (!zswap_compress(pages[i], entries[i], pool))
> +			return false;
> +	}
> +
> +	return true;
> +}

How about introducing a `zswap_compress_folio()` interface which
can be used by `zswap_store()`?
```
zswap_store()
	nr_pages = folio_nr_pages(folio)

	entries = zswap_alloc_entries(nr_pages)

	ret = zswap_compress_folio(folio, entries, pool)

	// store entries into xarray and LRU list
```

And this version `zswap_compress_folio()` is very simple for now:
```
zswap_compress_folio()
	nr_pages = folio_nr_pages(folio)

	for (index = 0; index < nr_pages; ++index) {
		struct page *page = folio_page(folio, index);

		if (!zswap_compress(page, &entries[index], pool))
			return false;
	}

	return true;
```
This can be easily extended to support your "batched" version.

Then the old `zswap_store_page()` could be removed.

The good point is simplicity, that we don't need to slice folio
into multiple batches, then repeat the common operations for each
batch, like preparing entries, storing into xarray and LRU list...

Thanks.
Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Nhat Pham 1 year, 2 months ago
On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
<chengming.zhou@linux.dev> wrote:
>
> How about introducing a `zswap_compress_folio()` interface which
> can be used by `zswap_store()`?
> ```
> zswap_store()
>         nr_pages = folio_nr_pages(folio)
>
>         entries = zswap_alloc_entries(nr_pages)
>
>         ret = zswap_compress_folio(folio, entries, pool)
>
>         // store entries into xarray and LRU list
> ```
>
> And this version `zswap_compress_folio()` is very simple for now:
> ```
> zswap_compress_folio()
>         nr_pages = folio_nr_pages(folio)
>
>         for (index = 0; index < nr_pages; ++index) {
>                 struct page *page = folio_page(folio, index);
>
>                 if (!zswap_compress(page, &entries[index], pool))
>                         return false;
>         }
>
>         return true;
> ```
> This can be easily extended to support your "batched" version.
>
> Then the old `zswap_store_page()` could be removed.
>
> The good point is simplicity, that we don't need to slice folio
> into multiple batches, then repeat the common operations for each
> batch, like preparing entries, storing into xarray and LRU list...
>

+1.
RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Sridhar, Kanchana P 1 year, 2 months ago
> -----Original Message-----
> From: Nhat Pham <nphamcs@gmail.com>
> Sent: Monday, December 2, 2024 4:17 PM
> To: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> yosryahmed@google.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> <chengming.zhou@linux.dev> wrote:
> >
> > How about introducing a `zswap_compress_folio()` interface which
> > can be used by `zswap_store()`?
> > ```
> > zswap_store()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         entries = zswap_alloc_entries(nr_pages)
> >
> >         ret = zswap_compress_folio(folio, entries, pool)
> >
> >         // store entries into xarray and LRU list
> > ```
> >
> > And this version `zswap_compress_folio()` is very simple for now:
> > ```
> > zswap_compress_folio()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         for (index = 0; index < nr_pages; ++index) {
> >                 struct page *page = folio_page(folio, index);
> >
> >                 if (!zswap_compress(page, &entries[index], pool))
> >                         return false;
> >         }
> >
> >         return true;
> > ```
> > This can be easily extended to support your "batched" version.
> >
> > Then the old `zswap_store_page()` could be removed.
> >
> > The good point is simplicity, that we don't need to slice folio
> > into multiple batches, then repeat the common operations for each
> > batch, like preparing entries, storing into xarray and LRU list...
> >
> 
> +1.

Thanks Nhat. I replied with some potential considerations in my reply
to Chengming's and Yosry's comments. Would appreciate it if you can
add follow-up suggestions to that reply.

Thanks,
Kanchana
Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Yosry Ahmed 1 year, 2 months ago
On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
<chengming.zhou@linux.dev> wrote:
>
> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > In order to set up zswap_store_pages() to enable a clean batching
> > implementation in [1], this patch implements the following changes:
> >
> > 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> >     all pages in the specified range for the folio, upfront. If this fails,
> >     we return an error status to zswap_store().
> >
> > 2) Addition of zswap_compress_pages() that calls zswap_compress() for each
> >     page, and returns false if any zswap_compress() fails, so
> >     zswap_store_page() can cleanup resources allocated and return an error
> >     status to zswap_store().
> >
> > 3) A "store_pages_failed" label that is a catch-all for all failure points
> >     in zswap_store_pages(). This facilitates cleaner error handling within
> >     zswap_store_pages(), which will become important for IAA compress
> >     batching in [1].
> >
> > [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > ---
> >   mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-------------
> >   1 file changed, 71 insertions(+), 22 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index b09d1023e775..db80c66e2205 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct *w)
> >   * main API
> >   **********************************/
> >
> > +static bool zswap_compress_pages(struct page *pages[],
> > +                              struct zswap_entry *entries[],
> > +                              u8 nr_pages,
> > +                              struct zswap_pool *pool)
> > +{
> > +     u8 i;
> > +
> > +     for (i = 0; i < nr_pages; ++i) {
> > +             if (!zswap_compress(pages[i], entries[i], pool))
> > +                     return false;
> > +     }
> > +
> > +     return true;
> > +}
>
> How about introducing a `zswap_compress_folio()` interface which
> can be used by `zswap_store()`?
> ```
> zswap_store()
>         nr_pages = folio_nr_pages(folio)
>
>         entries = zswap_alloc_entries(nr_pages)
>
>         ret = zswap_compress_folio(folio, entries, pool)
>
>         // store entries into xarray and LRU list
> ```
>
> And this version `zswap_compress_folio()` is very simple for now:
> ```
> zswap_compress_folio()
>         nr_pages = folio_nr_pages(folio)
>
>         for (index = 0; index < nr_pages; ++index) {
>                 struct page *page = folio_page(folio, index);
>
>                 if (!zswap_compress(page, &entries[index], pool))
>                         return false;
>         }
>
>         return true;
> ```
> This can be easily extended to support your "batched" version.
>
> Then the old `zswap_store_page()` could be removed.
>
> The good point is simplicity, that we don't need to slice folio
> into multiple batches, then repeat the common operations for each
> batch, like preparing entries, storing into xarray and LRU list...

+1

Also, I don't like the helpers hiding some of the loops and leaving
others, as Johannes said, please keep all the iteration over pages at
the same function level where possible to make the code clear.

This should not be a separate series too, when I said divide into
chunks I meant leave out the multiple folios batching and focus on
batching pages in a single large folio, not breaking down the series
into multiple ones. Not a big deal tho :)

>
> Thanks.
RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Sridhar, Kanchana P 1 year, 2 months ago
Hi Chengming, Yosry,

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Monday, December 2, 2024 11:33 AM
> To: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> <chengming.zhou@linux.dev> wrote:
> >
> > On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > > In order to set up zswap_store_pages() to enable a clean batching
> > > implementation in [1], this patch implements the following changes:
> > >
> > > 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> > >     all pages in the specified range for the folio, upfront. If this fails,
> > >     we return an error status to zswap_store().
> > >
> > > 2) Addition of zswap_compress_pages() that calls zswap_compress() for
> each
> > >     page, and returns false if any zswap_compress() fails, so
> > >     zswap_store_page() can cleanup resources allocated and return an
> error
> > >     status to zswap_store().
> > >
> > > 3) A "store_pages_failed" label that is a catch-all for all failure points
> > >     in zswap_store_pages(). This facilitates cleaner error handling within
> > >     zswap_store_pages(), which will become important for IAA compress
> > >     batching in [1].
> > >
> > > [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
> > >
> > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > ---
> > >   mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++----
> ---------
> > >   1 file changed, 71 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index b09d1023e775..db80c66e2205 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
> *w)
> > >   * main API
> > >   **********************************/
> > >
> > > +static bool zswap_compress_pages(struct page *pages[],
> > > +                              struct zswap_entry *entries[],
> > > +                              u8 nr_pages,
> > > +                              struct zswap_pool *pool)
> > > +{
> > > +     u8 i;
> > > +
> > > +     for (i = 0; i < nr_pages; ++i) {
> > > +             if (!zswap_compress(pages[i], entries[i], pool))
> > > +                     return false;
> > > +     }
> > > +
> > > +     return true;
> > > +}
> >
> > How about introducing a `zswap_compress_folio()` interface which
> > can be used by `zswap_store()`?
> > ```
> > zswap_store()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         entries = zswap_alloc_entries(nr_pages)
> >
> >         ret = zswap_compress_folio(folio, entries, pool)
> >
> >         // store entries into xarray and LRU list
> > ```
> >
> > And this version `zswap_compress_folio()` is very simple for now:
> > ```
> > zswap_compress_folio()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         for (index = 0; index < nr_pages; ++index) {
> >                 struct page *page = folio_page(folio, index);
> >
> >                 if (!zswap_compress(page, &entries[index], pool))
> >                         return false;
> >         }
> >
> >         return true;
> > ```
> > This can be easily extended to support your "batched" version.
> >
> > Then the old `zswap_store_page()` could be removed.
> >
> > The good point is simplicity, that we don't need to slice folio
> > into multiple batches, then repeat the common operations for each
> > batch, like preparing entries, storing into xarray and LRU list...
> 
> +1

Thanks for the code review comments. One question though: would
it make sense to trade-off the memory footprint cost with the code
simplification? For instance, lets say we want to store a 64k folio.
We would allocate memory for 16 zswap entries, and lets say one of
the compressions fails, we would deallocate memory for all 16 zswap
entries. Could this lead to zswap_entry kmem_cache starvation and
subsequent zswap_store() failures in multiple processes scenarios?

In other words, allocating entries in smaller batches -- more specifically,
only the compress batchsize -- seems to strike a balance in terms of
memory footprint, while mitigating the starvation aspect, and possibly
also helping latency (allocating a large # of zswap entries and potentially
deallocating, could impact latency).

If we agree with the merits of processing a large folio in smaller batches:
this in turn requires we store the smaller batches of entries in the
xarray/LRU before moving to the next batch. Which means all the
zswap_store() ops need to be done for a batch before moving to the next
batch.

> 
> Also, I don't like the helpers hiding some of the loops and leaving
> others, as Johannes said, please keep all the iteration over pages at
> the same function level where possible to make the code clear.

Sure. I can either inline all the loops into zswap_store_pages(), or convert
all iterations into helpers with a consistent signature:

zswap_<proc_name>(arrayed_struct, nr_pages);

Please let me know which would work best. Thanks!

> 
> This should not be a separate series too, when I said divide into
> chunks I meant leave out the multiple folios batching and focus on
> batching pages in a single large folio, not breaking down the series
> into multiple ones. Not a big deal tho :)

I understand. I am trying to de-couple and develop in parallel the
following, which I intend to converge into a v5 of the original series [1]:
  a) Vectorization, followed by batching of zswap_store() of large folios.
  b) acomp request chaining suggestions from Herbert, which could
       change the existing v4 implementation of the
       crypto_acomp_batch_compress() API that zswap would need to
       call for IAA compress batching.

[1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935

Thanks,
Kanchana

> 
> >
> > Thanks.
Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Chengming Zhou 1 year, 2 months ago
On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> Hi Chengming, Yosry,
> 
>> -----Original Message-----
>> From: Yosry Ahmed <yosryahmed@google.com>
>> Sent: Monday, December 2, 2024 11:33 AM
>> To: Chengming Zhou <chengming.zhou@linux.dev>
>> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
>> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
>> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
>> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
>> for batching.
>>
>> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
>> <chengming.zhou@linux.dev> wrote:
>>>
>>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
>>>> In order to set up zswap_store_pages() to enable a clean batching
>>>> implementation in [1], this patch implements the following changes:
>>>>
>>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
>>>>      all pages in the specified range for the folio, upfront. If this fails,
>>>>      we return an error status to zswap_store().
>>>>
>>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
>> each
>>>>      page, and returns false if any zswap_compress() fails, so
>>>>      zswap_store_page() can cleanup resources allocated and return an
>> error
>>>>      status to zswap_store().
>>>>
>>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
>>>>      in zswap_store_pages(). This facilitates cleaner error handling within
>>>>      zswap_store_pages(), which will become important for IAA compress
>>>>      batching in [1].
>>>>
>>>> [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
>>>>
>>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
>>>> ---
>>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++----
>> ---------
>>>>    1 file changed, 71 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/mm/zswap.c b/mm/zswap.c
>>>> index b09d1023e775..db80c66e2205 100644
>>>> --- a/mm/zswap.c
>>>> +++ b/mm/zswap.c
>>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
>> *w)
>>>>    * main API
>>>>    **********************************/
>>>>
>>>> +static bool zswap_compress_pages(struct page *pages[],
>>>> +                              struct zswap_entry *entries[],
>>>> +                              u8 nr_pages,
>>>> +                              struct zswap_pool *pool)
>>>> +{
>>>> +     u8 i;
>>>> +
>>>> +     for (i = 0; i < nr_pages; ++i) {
>>>> +             if (!zswap_compress(pages[i], entries[i], pool))
>>>> +                     return false;
>>>> +     }
>>>> +
>>>> +     return true;
>>>> +}
>>>
>>> How about introducing a `zswap_compress_folio()` interface which
>>> can be used by `zswap_store()`?
>>> ```
>>> zswap_store()
>>>          nr_pages = folio_nr_pages(folio)
>>>
>>>          entries = zswap_alloc_entries(nr_pages)
>>>
>>>          ret = zswap_compress_folio(folio, entries, pool)
>>>
>>>          // store entries into xarray and LRU list
>>> ```
>>>
>>> And this version `zswap_compress_folio()` is very simple for now:
>>> ```
>>> zswap_compress_folio()
>>>          nr_pages = folio_nr_pages(folio)
>>>
>>>          for (index = 0; index < nr_pages; ++index) {
>>>                  struct page *page = folio_page(folio, index);
>>>
>>>                  if (!zswap_compress(page, &entries[index], pool))
>>>                          return false;
>>>          }
>>>
>>>          return true;
>>> ```
>>> This can be easily extended to support your "batched" version.
>>>
>>> Then the old `zswap_store_page()` could be removed.
>>>
>>> The good point is simplicity, that we don't need to slice folio
>>> into multiple batches, then repeat the common operations for each
>>> batch, like preparing entries, storing into xarray and LRU list...
>>
>> +1
> 
> Thanks for the code review comments. One question though: would
> it make sense to trade-off the memory footprint cost with the code
> simplification? For instance, lets say we want to store a 64k folio.
> We would allocate memory for 16 zswap entries, and lets say one of
> the compressions fails, we would deallocate memory for all 16 zswap
> entries. Could this lead to zswap_entry kmem_cache starvation and
> subsequent zswap_store() failures in multiple processes scenarios?

Ah, I get your consideration. But it's the unlikely case, right?

If the case you mentioned above happens a lot, I think yes, we should
optimize its memory footprint to avoid allocation and deallocation.

On the other hand, we should consider a folio would be compressed
successfully in most cases. So we have to allocate all entries
eventually.

Based on your consideration, I think your way is ok too, although
I think the patch 2/2 should be dropped, since it hides pages loop
in smaller functions, as Yosry mentioned too.

> 
> In other words, allocating entries in smaller batches -- more specifically,
> only the compress batchsize -- seems to strike a balance in terms of
> memory footprint, while mitigating the starvation aspect, and possibly
> also helping latency (allocating a large # of zswap entries and potentially
> deallocating, could impact latency).

If we consider the likely case (compress successfully), the whole
latency should be better, right? Since we can bulk allocate all
entries at first, and bulk insert to xarray and LRU at last.

> 
> If we agree with the merits of processing a large folio in smaller batches:
> this in turn requires we store the smaller batches of entries in the
> xarray/LRU before moving to the next batch. Which means all the
> zswap_store() ops need to be done for a batch before moving to the next
> batch.
> 

Both way is ok for me based on your memory footprint consideration
above.

Thanks.
RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Sridhar, Kanchana P 1 year, 2 months ago
> -----Original Message-----
> From: Chengming Zhou <chengming.zhou@linux.dev>
> Sent: Monday, December 2, 2024 7:06 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Yosry Ahmed
> <yosryahmed@google.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> > Hi Chengming, Yosry,
> >
> >> -----Original Message-----
> >> From: Yosry Ahmed <yosryahmed@google.com>
> >> Sent: Monday, December 2, 2024 11:33 AM
> >> To: Chengming Zhou <chengming.zhou@linux.dev>
> >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> >> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> simplifications
> >> for batching.
> >>
> >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> >> <chengming.zhou@linux.dev> wrote:
> >>>
> >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> >>>> In order to set up zswap_store_pages() to enable a clean batching
> >>>> implementation in [1], this patch implements the following changes:
> >>>>
> >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> >>>>      all pages in the specified range for the folio, upfront. If this fails,
> >>>>      we return an error status to zswap_store().
> >>>>
> >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
> >> each
> >>>>      page, and returns false if any zswap_compress() fails, so
> >>>>      zswap_store_page() can cleanup resources allocated and return an
> >> error
> >>>>      status to zswap_store().
> >>>>
> >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
> >>>>      in zswap_store_pages(). This facilitates cleaner error handling within
> >>>>      zswap_store_pages(), which will become important for IAA compress
> >>>>      batching in [1].
> >>>>
> >>>> [1]: https://patchwork.kernel.org/project/linux-
> mm/list/?series=911935
> >>>>
> >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> >>>> ---
> >>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-
> ---
> >> ---------
> >>>>    1 file changed, 71 insertions(+), 22 deletions(-)
> >>>>
> >>>> diff --git a/mm/zswap.c b/mm/zswap.c
> >>>> index b09d1023e775..db80c66e2205 100644
> >>>> --- a/mm/zswap.c
> >>>> +++ b/mm/zswap.c
> >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
> >> *w)
> >>>>    * main API
> >>>>    **********************************/
> >>>>
> >>>> +static bool zswap_compress_pages(struct page *pages[],
> >>>> +                              struct zswap_entry *entries[],
> >>>> +                              u8 nr_pages,
> >>>> +                              struct zswap_pool *pool)
> >>>> +{
> >>>> +     u8 i;
> >>>> +
> >>>> +     for (i = 0; i < nr_pages; ++i) {
> >>>> +             if (!zswap_compress(pages[i], entries[i], pool))
> >>>> +                     return false;
> >>>> +     }
> >>>> +
> >>>> +     return true;
> >>>> +}
> >>>
> >>> How about introducing a `zswap_compress_folio()` interface which
> >>> can be used by `zswap_store()`?
> >>> ```
> >>> zswap_store()
> >>>          nr_pages = folio_nr_pages(folio)
> >>>
> >>>          entries = zswap_alloc_entries(nr_pages)
> >>>
> >>>          ret = zswap_compress_folio(folio, entries, pool)
> >>>
> >>>          // store entries into xarray and LRU list
> >>> ```
> >>>
> >>> And this version `zswap_compress_folio()` is very simple for now:
> >>> ```
> >>> zswap_compress_folio()
> >>>          nr_pages = folio_nr_pages(folio)
> >>>
> >>>          for (index = 0; index < nr_pages; ++index) {
> >>>                  struct page *page = folio_page(folio, index);
> >>>
> >>>                  if (!zswap_compress(page, &entries[index], pool))
> >>>                          return false;
> >>>          }
> >>>
> >>>          return true;
> >>> ```
> >>> This can be easily extended to support your "batched" version.
> >>>
> >>> Then the old `zswap_store_page()` could be removed.
> >>>
> >>> The good point is simplicity, that we don't need to slice folio
> >>> into multiple batches, then repeat the common operations for each
> >>> batch, like preparing entries, storing into xarray and LRU list...
> >>
> >> +1
> >
> > Thanks for the code review comments. One question though: would
> > it make sense to trade-off the memory footprint cost with the code
> > simplification? For instance, lets say we want to store a 64k folio.
> > We would allocate memory for 16 zswap entries, and lets say one of
> > the compressions fails, we would deallocate memory for all 16 zswap
> > entries. Could this lead to zswap_entry kmem_cache starvation and
> > subsequent zswap_store() failures in multiple processes scenarios?
> 
> Ah, I get your consideration. But it's the unlikely case, right?
> 
> If the case you mentioned above happens a lot, I think yes, we should
> optimize its memory footprint to avoid allocation and deallocation.

Thanks Chengming. I see your point. Let me gather performance data
for the two options, and share.

> 
> On the other hand, we should consider a folio would be compressed
> successfully in most cases. So we have to allocate all entries
> eventually.
> 
> Based on your consideration, I think your way is ok too, although
> I think the patch 2/2 should be dropped, since it hides pages loop
> in smaller functions, as Yosry mentioned too.

My main intent with patch 2/2 was to set up the error handling
path to be common and simpler, whether errors were encountered
during compression/zpool_malloc/xarray store. Hence, I initialize the
allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(-EINVAL),
so it is easy for the common error handling code in patch 2 to determine
if the handle was allocated (and hence needs to be freed). This benefits
the batching code by eliminating the need to maintain state as to which
stage of zswap_store_pages() sees an error, based on which resources
would need to be deleted.

My key consideration is to keep the batching error handling code simple,
hence these changes in patch 2. The changes described above would
help batching, and should not impact the non-batching case, as indicated
by the regression testing data I've included in the cover letter.

I don't mind inlining the implementation of the helper functions, as I
mentioned in my response to Yosry. I am hoping the error handling
simplifications are acceptable, since they will help the batching code.

> 
> >
> > In other words, allocating entries in smaller batches -- more specifically,
> > only the compress batchsize -- seems to strike a balance in terms of
> > memory footprint, while mitigating the starvation aspect, and possibly
> > also helping latency (allocating a large # of zswap entries and potentially
> > deallocating, could impact latency).
> 
> If we consider the likely case (compress successfully), the whole
> latency should be better, right? Since we can bulk allocate all
> entries at first, and bulk insert to xarray and LRU at last.

I think so too, but would like to confirm with some experiments and update.

> 
> >
> > If we agree with the merits of processing a large folio in smaller batches:
> > this in turn requires we store the smaller batches of entries in the
> > xarray/LRU before moving to the next batch. Which means all the
> > zswap_store() ops need to be done for a batch before moving to the next
> > batch.
> >
> 
> Both way is ok for me based on your memory footprint consideration
> above.

Sounds good, thanks!

Thanks,
Kanchana

> 
> Thanks.
Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Yosry Ahmed 1 year, 2 months ago
On Mon, Dec 2, 2024 at 8:19 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Chengming Zhou <chengming.zhou@linux.dev>
> > Sent: Monday, December 2, 2024 7:06 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Yosry Ahmed
> > <yosryahmed@google.com>
> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > hannes@cmpxchg.org; nphamcs@gmail.com; usamaarif642@gmail.com;
> > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> > for batching.
> >
> > On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> > > Hi Chengming, Yosry,
> > >
> > >> -----Original Message-----
> > >> From: Yosry Ahmed <yosryahmed@google.com>
> > >> Sent: Monday, December 2, 2024 11:33 AM
> > >> To: Chengming Zhou <chengming.zhou@linux.dev>
> > >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> > >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> > >> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> > simplifications
> > >> for batching.
> > >>
> > >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> > >> <chengming.zhou@linux.dev> wrote:
> > >>>
> > >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > >>>> In order to set up zswap_store_pages() to enable a clean batching
> > >>>> implementation in [1], this patch implements the following changes:
> > >>>>
> > >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> > >>>>      all pages in the specified range for the folio, upfront. If this fails,
> > >>>>      we return an error status to zswap_store().
> > >>>>
> > >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
> > >> each
> > >>>>      page, and returns false if any zswap_compress() fails, so
> > >>>>      zswap_store_page() can cleanup resources allocated and return an
> > >> error
> > >>>>      status to zswap_store().
> > >>>>
> > >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
> > >>>>      in zswap_store_pages(). This facilitates cleaner error handling within
> > >>>>      zswap_store_pages(), which will become important for IAA compress
> > >>>>      batching in [1].
> > >>>>
> > >>>> [1]: https://patchwork.kernel.org/project/linux-
> > mm/list/?series=911935
> > >>>>
> > >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > >>>> ---
> > >>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-
> > ---
> > >> ---------
> > >>>>    1 file changed, 71 insertions(+), 22 deletions(-)
> > >>>>
> > >>>> diff --git a/mm/zswap.c b/mm/zswap.c
> > >>>> index b09d1023e775..db80c66e2205 100644
> > >>>> --- a/mm/zswap.c
> > >>>> +++ b/mm/zswap.c
> > >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
> > >> *w)
> > >>>>    * main API
> > >>>>    **********************************/
> > >>>>
> > >>>> +static bool zswap_compress_pages(struct page *pages[],
> > >>>> +                              struct zswap_entry *entries[],
> > >>>> +                              u8 nr_pages,
> > >>>> +                              struct zswap_pool *pool)
> > >>>> +{
> > >>>> +     u8 i;
> > >>>> +
> > >>>> +     for (i = 0; i < nr_pages; ++i) {
> > >>>> +             if (!zswap_compress(pages[i], entries[i], pool))
> > >>>> +                     return false;
> > >>>> +     }
> > >>>> +
> > >>>> +     return true;
> > >>>> +}
> > >>>
> > >>> How about introducing a `zswap_compress_folio()` interface which
> > >>> can be used by `zswap_store()`?
> > >>> ```
> > >>> zswap_store()
> > >>>          nr_pages = folio_nr_pages(folio)
> > >>>
> > >>>          entries = zswap_alloc_entries(nr_pages)
> > >>>
> > >>>          ret = zswap_compress_folio(folio, entries, pool)
> > >>>
> > >>>          // store entries into xarray and LRU list
> > >>> ```
> > >>>
> > >>> And this version `zswap_compress_folio()` is very simple for now:
> > >>> ```
> > >>> zswap_compress_folio()
> > >>>          nr_pages = folio_nr_pages(folio)
> > >>>
> > >>>          for (index = 0; index < nr_pages; ++index) {
> > >>>                  struct page *page = folio_page(folio, index);
> > >>>
> > >>>                  if (!zswap_compress(page, &entries[index], pool))
> > >>>                          return false;
> > >>>          }
> > >>>
> > >>>          return true;
> > >>> ```
> > >>> This can be easily extended to support your "batched" version.
> > >>>
> > >>> Then the old `zswap_store_page()` could be removed.
> > >>>
> > >>> The good point is simplicity, that we don't need to slice folio
> > >>> into multiple batches, then repeat the common operations for each
> > >>> batch, like preparing entries, storing into xarray and LRU list...
> > >>
> > >> +1
> > >
> > > Thanks for the code review comments. One question though: would
> > > it make sense to trade-off the memory footprint cost with the code
> > > simplification? For instance, lets say we want to store a 64k folio.
> > > We would allocate memory for 16 zswap entries, and lets say one of
> > > the compressions fails, we would deallocate memory for all 16 zswap
> > > entries. Could this lead to zswap_entry kmem_cache starvation and
> > > subsequent zswap_store() failures in multiple processes scenarios?
> >
> > Ah, I get your consideration. But it's the unlikely case, right?
> >
> > If the case you mentioned above happens a lot, I think yes, we should
> > optimize its memory footprint to avoid allocation and deallocation.
>
> Thanks Chengming. I see your point. Let me gather performance data
> for the two options, and share.

Yeah I think we shouldn't optimize for the uncommon case, not until
there's a real problem that needs fixing.

>
> >
> > On the other hand, we should consider a folio would be compressed
> > successfully in most cases. So we have to allocate all entries
> > eventually.
> >
> > Based on your consideration, I think your way is ok too, although
> > I think the patch 2/2 should be dropped, since it hides pages loop
> > in smaller functions, as Yosry mentioned too.
>
> My main intent with patch 2/2 was to set up the error handling
> path to be common and simpler, whether errors were encountered
> during compression/zpool_malloc/xarray store. Hence, I initialize the
> allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(-EINVAL),
> so it is easy for the common error handling code in patch 2 to determine
> if the handle was allocated (and hence needs to be freed). This benefits
> the batching code by eliminating the need to maintain state as to which
> stage of zswap_store_pages() sees an error, based on which resources
> would need to be deleted.
>
> My key consideration is to keep the batching error handling code simple,
> hence these changes in patch 2. The changes described above would
> help batching, and should not impact the non-batching case, as indicated
> by the regression testing data I've included in the cover letter.
>
> I don't mind inlining the implementation of the helper functions, as I
> mentioned in my response to Yosry. I am hoping the error handling
> simplifications are acceptable, since they will help the batching code.

I think having the loops open-coded should still be better than
separate helpers. But I understand not wanting to have the loops
directly in zswap_store(), as the error handling would be simpler if
we do it in a separate function like zswap_store_pages().

How about we just move the loop from  zswap_store() to
zswap_store_page() and call it zswap_store_folio()? When batching is
added I imagine we may need to split the loop into two loops before
and after zswap_compress_folio(), which isn't very neat but is
probably fine.
RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
Posted by Sridhar, Kanchana P 1 year, 2 months ago
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Monday, December 2, 2024 9:50 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On Mon, Dec 2, 2024 at 8:19 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > Sent: Monday, December 2, 2024 7:06 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Yosry Ahmed
> > > <yosryahmed@google.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; nphamcs@gmail.com; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-
> foundation.org;
> > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> simplifications
> > > for batching.
> > >
> > > On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> > > > Hi Chengming, Yosry,
> > > >
> > > >> -----Original Message-----
> > > >> From: Yosry Ahmed <yosryahmed@google.com>
> > > >> Sent: Monday, December 2, 2024 11:33 AM
> > > >> To: Chengming Zhou <chengming.zhou@linux.dev>
> > > >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> > > >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> > > >> nphamcs@gmail.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com;
> > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> > > simplifications
> > > >> for batching.
> > > >>
> > > >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> > > >> <chengming.zhou@linux.dev> wrote:
> > > >>>
> > > >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > > >>>> In order to set up zswap_store_pages() to enable a clean batching
> > > >>>> implementation in [1], this patch implements the following changes:
> > > >>>>
> > > >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap
> entries for
> > > >>>>      all pages in the specified range for the folio, upfront. If this fails,
> > > >>>>      we return an error status to zswap_store().
> > > >>>>
> > > >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress()
> for
> > > >> each
> > > >>>>      page, and returns false if any zswap_compress() fails, so
> > > >>>>      zswap_store_page() can cleanup resources allocated and return
> an
> > > >> error
> > > >>>>      status to zswap_store().
> > > >>>>
> > > >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
> > > >>>>      in zswap_store_pages(). This facilitates cleaner error handling
> within
> > > >>>>      zswap_store_pages(), which will become important for IAA
> compress
> > > >>>>      batching in [1].
> > > >>>>
> > > >>>> [1]: https://patchwork.kernel.org/project/linux-
> > > mm/list/?series=911935
> > > >>>>
> > > >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > >>>> ---
> > > >>>>    mm/zswap.c | 93
> +++++++++++++++++++++++++++++++++++++++++-
> > > ---
> > > >> ---------
> > > >>>>    1 file changed, 71 insertions(+), 22 deletions(-)
> > > >>>>
> > > >>>> diff --git a/mm/zswap.c b/mm/zswap.c
> > > >>>> index b09d1023e775..db80c66e2205 100644
> > > >>>> --- a/mm/zswap.c
> > > >>>> +++ b/mm/zswap.c
> > > >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct
> work_struct
> > > >> *w)
> > > >>>>    * main API
> > > >>>>    **********************************/
> > > >>>>
> > > >>>> +static bool zswap_compress_pages(struct page *pages[],
> > > >>>> +                              struct zswap_entry *entries[],
> > > >>>> +                              u8 nr_pages,
> > > >>>> +                              struct zswap_pool *pool)
> > > >>>> +{
> > > >>>> +     u8 i;
> > > >>>> +
> > > >>>> +     for (i = 0; i < nr_pages; ++i) {
> > > >>>> +             if (!zswap_compress(pages[i], entries[i], pool))
> > > >>>> +                     return false;
> > > >>>> +     }
> > > >>>> +
> > > >>>> +     return true;
> > > >>>> +}
> > > >>>
> > > >>> How about introducing a `zswap_compress_folio()` interface which
> > > >>> can be used by `zswap_store()`?
> > > >>> ```
> > > >>> zswap_store()
> > > >>>          nr_pages = folio_nr_pages(folio)
> > > >>>
> > > >>>          entries = zswap_alloc_entries(nr_pages)
> > > >>>
> > > >>>          ret = zswap_compress_folio(folio, entries, pool)
> > > >>>
> > > >>>          // store entries into xarray and LRU list
> > > >>> ```
> > > >>>
> > > >>> And this version `zswap_compress_folio()` is very simple for now:
> > > >>> ```
> > > >>> zswap_compress_folio()
> > > >>>          nr_pages = folio_nr_pages(folio)
> > > >>>
> > > >>>          for (index = 0; index < nr_pages; ++index) {
> > > >>>                  struct page *page = folio_page(folio, index);
> > > >>>
> > > >>>                  if (!zswap_compress(page, &entries[index], pool))
> > > >>>                          return false;
> > > >>>          }
> > > >>>
> > > >>>          return true;
> > > >>> ```
> > > >>> This can be easily extended to support your "batched" version.
> > > >>>
> > > >>> Then the old `zswap_store_page()` could be removed.
> > > >>>
> > > >>> The good point is simplicity, that we don't need to slice folio
> > > >>> into multiple batches, then repeat the common operations for each
> > > >>> batch, like preparing entries, storing into xarray and LRU list...
> > > >>
> > > >> +1
> > > >
> > > > Thanks for the code review comments. One question though: would
> > > > it make sense to trade-off the memory footprint cost with the code
> > > > simplification? For instance, lets say we want to store a 64k folio.
> > > > We would allocate memory for 16 zswap entries, and lets say one of
> > > > the compressions fails, we would deallocate memory for all 16 zswap
> > > > entries. Could this lead to zswap_entry kmem_cache starvation and
> > > > subsequent zswap_store() failures in multiple processes scenarios?
> > >
> > > Ah, I get your consideration. But it's the unlikely case, right?
> > >
> > > If the case you mentioned above happens a lot, I think yes, we should
> > > optimize its memory footprint to avoid allocation and deallocation.
> >
> > Thanks Chengming. I see your point. Let me gather performance data
> > for the two options, and share.
> 
> Yeah I think we shouldn't optimize for the uncommon case, not until
> there's a real problem that needs fixing.

Agreed.

> 
> >
> > >
> > > On the other hand, we should consider a folio would be compressed
> > > successfully in most cases. So we have to allocate all entries
> > > eventually.
> > >
> > > Based on your consideration, I think your way is ok too, although
> > > I think the patch 2/2 should be dropped, since it hides pages loop
> > > in smaller functions, as Yosry mentioned too.
> >
> > My main intent with patch 2/2 was to set up the error handling
> > path to be common and simpler, whether errors were encountered
> > during compression/zpool_malloc/xarray store. Hence, I initialize the
> > allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(-
> EINVAL),
> > so it is easy for the common error handling code in patch 2 to determine
> > if the handle was allocated (and hence needs to be freed). This benefits
> > the batching code by eliminating the need to maintain state as to which
> > stage of zswap_store_pages() sees an error, based on which resources
> > would need to be deleted.
> >
> > My key consideration is to keep the batching error handling code simple,
> > hence these changes in patch 2. The changes described above would
> > help batching, and should not impact the non-batching case, as indicated
> > by the regression testing data I've included in the cover letter.
> >
> > I don't mind inlining the implementation of the helper functions, as I
> > mentioned in my response to Yosry. I am hoping the error handling
> > simplifications are acceptable, since they will help the batching code.
> 
> I think having the loops open-coded should still be better than
> separate helpers. But I understand not wanting to have the loops
> directly in zswap_store(), as the error handling would be simpler if
> we do it in a separate function like zswap_store_pages().
> 
> How about we just move the loop from  zswap_store() to
> zswap_store_page() and call it zswap_store_folio()? When batching is
> added I imagine we may need to split the loop into two loops before
> and after zswap_compress_folio(), which isn't very neat but is
> probably fine.

Sure, this sounds like a good way to organize the code. I will proceed
as suggested.

Thanks,
Kanchana