> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Sunday, August 24, 2025 10:39 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Nhat Pham <nphamcs@gmail.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; hannes@cmpxchg.org; yosry.ahmed@linux.dev; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; > senozhatsky@chromium.org; linux-crypto@vger.kernel.org; > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > <kristen.c.accardi@intel.com>; Gomes, Vinicius <vinicius.gomes@intel.com>; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v11 00/24] zswap compression batching with optimized > iaa_crypto driver > > On Fri, Aug 22, 2025 at 07:26:34PM +0000, Sridhar, Kanchana P wrote: > > > > 1) The zswap per-CPU acomp_ctx has two sg_tables added, one each for > > inputs/outputs, with nents set to the pool->compr_batch_size (1 for > software > > compressors). This per-CPU data incurs additional memory overhead per- > CPU, > > however this is memory that will anyway be allocated on the stack in > > zswap_compress(); and less memory overhead than the latter because we > know > > exactly how many sg_table scatterlists to allocate for the given pool > > (assuming we don't kmalloc in zswap_compress()). I will make sure to > quantify > > the overhead in v12's commit logs. > > There is no need for any SG lists for the source. The folio should > be submitted as the source. > > So only the destination requires an SG list. > > > 6) "For the source, nothing needs to be done because the folio could be > passed > > in as is.". As far as I know, this cannot be accomplished without > > modifications to the crypto API for software compressors, because > compressed > > buffers need to be stored in the zswap/zram zs_pools at PAGE_SIZE > > granularity. > > Sure. But all it needs is one central fallback path in the acompress > API. I can do this for you. Thanks Herbert, for reviewing the approach. IIUC, we should follow these constraints: 1) The folio should be submitted as the source. 2) For the destination, construct an SG list for them and pass that in. The rule should be that the SG list must contain a sufficient number of pages for the compression output based on the given unit size (PAGE_SIZE for zswap). For PMD folios, there would be 512 compression outputs. In this case, would we need to pass in an SG list that can contain 512 compression outputs after calling the acompress API once? If so, this might not be feasible for zswap since there are only "batch-size" number of pre-allocated per-CPU output buffers, where "batch_size" is the max number of pages that can be compressed in one call to the algorithm (1 for software compressors). Hence, gathering all 512 compression outputs may not be possible in a single invocation to crypto_acomp_compress(). Is the suggestion to allocate 512 per-CPU output buffers to overcome this? This could be memory-wise very expensive. Please let me know if I am missing something. Thanks for offering to make the necessary changes to the acompress API. Hoping we can sync on the approach! Best regards, Kanchana > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Mon, Aug 25, 2025 at 06:12:19PM +0000, Sridhar, Kanchana P wrote: > > Thanks Herbert, for reviewing the approach. IIUC, we should follow > these constraints: > > 1) The folio should be submitted as the source. > > 2) For the destination, construct an SG list for them and pass that in. > The rule should be that the SG list must contain a sufficient number > of pages for the compression output based on the given unit size > (PAGE_SIZE for zswap). > > For PMD folios, there would be 512 compression outputs. In this case, > would we need to pass in an SG list that can contain 512 compression > outputs after calling the acompress API once? Eventually yes :) But for now we're just replicating your current patch-set, so the folio should come with an offset and a length restriction, and correspondingly the destination SG list should contain the same number of pages as there are in your current patch-set. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Monday, August 25, 2025 6:13 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Nhat Pham <nphamcs@gmail.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; hannes@cmpxchg.org; yosry.ahmed@linux.dev; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; > senozhatsky@chromium.org; linux-crypto@vger.kernel.org; > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > <kristen.c.accardi@intel.com>; Gomes, Vinicius <vinicius.gomes@intel.com>; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v11 00/24] zswap compression batching with optimized > iaa_crypto driver > > On Mon, Aug 25, 2025 at 06:12:19PM +0000, Sridhar, Kanchana P wrote: > > > > Thanks Herbert, for reviewing the approach. IIUC, we should follow > > these constraints: > > > > 1) The folio should be submitted as the source. > > > > 2) For the destination, construct an SG list for them and pass that in. > > The rule should be that the SG list must contain a sufficient number > > of pages for the compression output based on the given unit size > > (PAGE_SIZE for zswap). > > > > For PMD folios, there would be 512 compression outputs. In this case, > > would we need to pass in an SG list that can contain 512 compression > > outputs after calling the acompress API once? > > Eventually yes :) > > But for now we're just replicating your current patch-set, so > the folio should come with an offset and a length restriction, > and correspondingly the destination SG list should contain the > same number of pages as there are in your current patch-set. Thanks Herbert. Just want to make sure I understand this. Are you referring to replacing sg_set_page() for the input with sg_set_folio()? We have to pass in a scatterlist for the acomp_req->src.. This is how the converged zswap_compress() code would look for batch compression of "nr_pages" in "folio", starting at index "start". The input SG list will contain "nr_comps" pages: nr_comps is 1 for software and 8 for IAA. The destination SG list will contain an equivalent number of buffers (each is PAGE_SIZE * 2). Based on your suggestions, I was able to come up with a unified implementation for software and hardware compressors: the SG list for the input is a key aspect of this (lines 24-25 from the start of the procedure): static bool zswap_compress(struct folio *folio, long start, unsigned int nr_pages, struct zswap_entry *entries[], struct zswap_pool *pool, int node_id) { unsigned int nr_comps = min(nr_pages, pool->compr_batch_size); unsigned int dlens[ZSWAP_MAX_BATCH_SIZE]; struct crypto_acomp_ctx *acomp_ctx; struct zpool *zpool = pool->zpool; struct scatterlist *sg; unsigned int i, j, k; gfp_t gfp; int err; gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE; acomp_ctx = raw_cpu_ptr(pool->acomp_ctx); mutex_lock(&acomp_ctx->mutex); prefetchw(acomp_ctx->sg_inputs->sgl); prefetchw(acomp_ctx->sg_outputs->sgl); /* * Note: * [i] refers to the incoming batch space and is used to * index into the folio pages and @entries. * * [k] refers to the @acomp_ctx space, as determined by * @pool->compr_batch_size, and is used to index into * @acomp_ctx->buffers and @dlens. */ for (i = 0; i < nr_pages; i += nr_comps) { for_each_sg(acomp_ctx->sg_inputs->sgl, sg, nr_comps, k) sg_set_folio(sg, folio, PAGE_SIZE, (start + k + i) * PAGE_SIZE); /* * We need PAGE_SIZE * 2 here since there maybe over-compression case, * and hardware-accelerators may won't check the dst buffer size, so * giving the dst buffer with enough length to avoid buffer overflow. */ for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) sg_set_buf(sg, acomp_ctx->buffers[k], PAGE_SIZE * 2); acomp_request_set_params(acomp_ctx->req, acomp_ctx->sg_inputs->sgl, acomp_ctx->sg_outputs->sgl, nr_comps * PAGE_SIZE, nr_comps * PAGE_SIZE); err = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); if (unlikely(err)) { if (nr_comps == 1) dlens[0] = err; goto compress_error; } if (nr_comps == 1) dlens[0] = acomp_ctx->req->dlen; else for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) dlens[k] = sg->length; [ store each compressed page in zpool] I quickly tested this with usemem 30 processes. Switching from sg_set_page() to sg_set_folio() does cause a 15% throughput regression for IAA and 2% regression for zstd: usemem30/64K folios/deflate-iaa/Avg throughput (KB/s): sg_set_page(): 357,141 sg_set_folio(): 304,696 usemem30/64K folios/zstd/Avg throughput (KB/s): sg_set_page(): 230,760 sg_set_folio(): 226,246 In my experience, zswap_compress() is highly performance critical code and the smallest compute additions can cause significant impact on workload performance and sys time. Given the code simplification and unification that your SG list suggestions have enabled, may I understand better why sg_set_folio() is preferred? Again, my apologies if I have misunderstood your suggestion, but I think it is worth getting this clarified so we are all in agreement. Thanks and best regards, Kanchana > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Tue, Aug 26, 2025 at 04:09:45AM +0000, Sridhar, Kanchana P wrote: > > Thanks Herbert. Just want to make sure I understand this. Are you > referring to replacing sg_set_page() for the input with sg_set_folio()? > We have to pass in a scatterlist for the acomp_req->src.. I'm talking about acomp_request_set_src_folio. You can pass just a portion of a folio by specifying an offset and a length. > for (i = 0; i < nr_pages; i += nr_comps) { > for_each_sg(acomp_ctx->sg_inputs->sgl, sg, nr_comps, k) > sg_set_folio(sg, folio, PAGE_SIZE, (start + k + i) * PAGE_SIZE); > > /* > * We need PAGE_SIZE * 2 here since there maybe over-compression case, > * and hardware-accelerators may won't check the dst buffer size, so > * giving the dst buffer with enough length to avoid buffer overflow. > */ > for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) > sg_set_buf(sg, acomp_ctx->buffers[k], PAGE_SIZE * 2); > > acomp_request_set_params(acomp_ctx->req, > acomp_ctx->sg_inputs->sgl, > acomp_ctx->sg_outputs->sgl, > nr_comps * PAGE_SIZE, > nr_comps * PAGE_SIZE); I meant something more like: acomp_request_set_src_folio(req, folio, start_offset, nr_comps * PAGE_SIZE); acomp_request_set_dst_sg(req, acomp_ctx_sg_outputs->sgl, nr_comps * PAGE_SIZE); acomp_request_set_unit_size(req, PAGE_SIZE); Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Monday, August 25, 2025 9:15 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Nhat Pham <nphamcs@gmail.com>; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; hannes@cmpxchg.org; yosry.ahmed@linux.dev; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; > ying.huang@linux.alibaba.com; akpm@linux-foundation.org; > senozhatsky@chromium.org; linux-crypto@vger.kernel.org; > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > <kristen.c.accardi@intel.com>; Gomes, Vinicius <vinicius.gomes@intel.com>; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v11 00/24] zswap compression batching with optimized > iaa_crypto driver > > On Tue, Aug 26, 2025 at 04:09:45AM +0000, Sridhar, Kanchana P wrote: > > > > Thanks Herbert. Just want to make sure I understand this. Are you > > referring to replacing sg_set_page() for the input with sg_set_folio()? > > We have to pass in a scatterlist for the acomp_req->src.. > > I'm talking about acomp_request_set_src_folio. You can pass just > a portion of a folio by specifying an offset and a length. > > > for (i = 0; i < nr_pages; i += nr_comps) { > > for_each_sg(acomp_ctx->sg_inputs->sgl, sg, nr_comps, k) > > sg_set_folio(sg, folio, PAGE_SIZE, (start + k + i) * PAGE_SIZE); > > > > /* > > * We need PAGE_SIZE * 2 here since there maybe over- > compression case, > > * and hardware-accelerators may won't check the dst buffer size, > so > > * giving the dst buffer with enough length to avoid buffer overflow. > > */ > > for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) > > sg_set_buf(sg, acomp_ctx->buffers[k], PAGE_SIZE * 2); > > > > acomp_request_set_params(acomp_ctx->req, > > acomp_ctx->sg_inputs->sgl, > > acomp_ctx->sg_outputs->sgl, > > nr_comps * PAGE_SIZE, > > nr_comps * PAGE_SIZE); > > I meant something more like: > > acomp_request_set_src_folio(req, folio, start_offset, > nr_comps * PAGE_SIZE); > acomp_request_set_dst_sg(req, acomp_ctx_sg_outputs->sgl, > nr_comps * PAGE_SIZE); > acomp_request_set_unit_size(req, PAGE_SIZE); Ok, I get it now :) Thanks. I will try this out, and pending any issues that may arise from testing, I might be all set for putting together v12. Thanks again Herbert, I appreciate it. Best regards, Kanchana > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
© 2016 - 2025 Red Hat, Inc.