From nobody Mon Feb 9 03:51:11 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 663EA2222C5; Sun, 25 Jan 2026 03:36:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769312173; cv=none; b=UgVdWlZjFl2Y70mC93q35jvjrrHMbGQ5w7te2kborPzBqSdlEZdBU46a0lfgcdGvyzY8st8freyv0kpsiWz304CyFZ+tTivxzYgWjEP9DMoX+KQ/8H6iG3/gBB/IyvLzMU57HH7JL/Di71UHdNSZDqdWaCb4M2oUsvmFrsE5ZCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769312173; c=relaxed/simple; bh=bR1iNfDYKkltgdSabWACZpB2PlvjS1g3cOJAx17tkvY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hcmEbEYd+958IapExiLf4+aeg4pIbix6JbAr3KtqpY4j+l/LrANrm6I/waqbdnYSx0AKZkaEcVudOOFATSLm1PdLFBZtvxGEeG4ZCAvYpxL2zMPV9v/UH8u5TrU2lM2pSuI09bcy6N+9ZSTAcY2U3rPuMbNeY/NFV/JHZwPgvVw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lp1Ju6FU; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lp1Ju6FU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1769312169; x=1800848169; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bR1iNfDYKkltgdSabWACZpB2PlvjS1g3cOJAx17tkvY=; b=lp1Ju6FU27d+UzzjIwdWiHJ1bj/BxZk2x4gos9RsVKCxB6x/voYzit0k pvbJ8LTd8KGxVa7dkCK6HF6il6Oy+yU1HBtEi7Va60i/QCXDlRNk280mr da6cvskEjhx16neJMATnF2n2W+DUl5Y8yUyt+AGK7vilUXq/auL7/UsnE KtkFhPUVpHXIjug2hyIwClBs/fufFjVaRomHrn85Pbau+8p21+kl8+hTr nhot+IIU7UXrJS6uKccaGutZuCp/ym1C6qOVzbpYEKoTGrrJa2BYegTUW 4MLlki7BmhMjHn8cleUE5p1LNe9A2FhGWuVE0D0ggnVj8HVGsokiE0uPO g==; X-CSE-ConnectionGUID: goGTPLpoQdO+y62aVg+J5g== X-CSE-MsgGUID: wXjc9p4vSM2fpRpH0wxmAg== X-IronPort-AV: E=McAfee;i="6800,10657,11681"; a="81887573" X-IronPort-AV: E=Sophos;i="6.21,252,1763452800"; d="scan'208";a="81887573" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2026 19:36:05 -0800 X-CSE-ConnectionGUID: 3oX98+pvR7WYGHv9/ZDvRQ== X-CSE-MsgGUID: Uzm3evzKQm+XCWCVJWEW5w== X-ExtLoop1: 1 Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by fmviesa003.fm.intel.com with ESMTP; 24 Jan 2026 19:36:04 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, senozhatsky@chromium.org, sj@kernel.org, kasong@tencent.com, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com, vinicius.gomes@intel.com, giovanni.cabiddu@intel.com Cc: wajdi.k.feghali@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v14 18/26] crypto: acomp, iaa - crypto_acomp integration of IAA Batching. Date: Sat, 24 Jan 2026 19:35:29 -0800 Message-Id: <20260125033537.334628-19-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20260125033537.334628-1-kanchana.p.sridhar@intel.com> References: <20260125033537.334628-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit makes the necessary changes for correctly integrating IAA compress/decompress batching with the crypto_acomp API as per the discussions in [1]. Further, IAA sets crypto_alg flags to indicate support for segmentation. To provide context from the perspective of a kernel user such as zswap, the zswap interface to these batching API will be done by setting up the acomp_req through these crypto API to designate multiple src/dst SG lists representing the batch being sent to iaa_crypto: acomp_request_set_src_folio() acomp_request_set_dst_sg() acomp_request_set_unit_size() before proceeding to invoke batch compression using the existing crypto_acomp_compress() interface. Within crypto_acomp_compress(), an acomp_req whose tfm supports segmentation is further tested for an "slen" that is greater than the request's unit_size. If so, we invoke "acomp_do_req_batch_parallel()", similar to the "acomp_do_req_chain()" case. acomp_do_req_batch_parallel() creates a wait_queue_head "batch_parallel_wq", stores it in the acomp_req's "__ctx", then calls tfm->compress()/tfm->decompress(). Next, the iaa_crypto driver alg's compress() implementation submits the batch's requests and immediately returns to acomp_do_req_batch_parallel(); which then waits for the "batch_parallel_wq" to be notified by a tfm->batch_completed() event. To support this, a "batch_completed()" API is added to "struct crypto_acomp" and "struct acomp_alg". The iaa_crypto driver alg's batch_completed() implementation waits for each batch sub-request to complete and notifies the batch_parallel_wq. If any sub-request has an error, -EINVAL is returned to the acomp_req's callback, else 0. [1]: https://lore.kernel.org/all/aRqSqQxR4eHzvb2g@gondor.apana.org.au/ Suggested-by: Herbert Xu Signed-off-by: Kanchana P Sridhar --- crypto/acompress.c | 63 ++++++++++ drivers/crypto/intel/iaa/iaa_crypto.h | 3 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 137 +++++++++++++++++++-- include/crypto/acompress.h | 7 ++ include/crypto/internal/acompress.h | 7 ++ 5 files changed, 210 insertions(+), 7 deletions(-) diff --git a/crypto/acompress.c b/crypto/acompress.c index cfb8ede02cf4..c48a1a20e21f 100644 --- a/crypto/acompress.c +++ b/crypto/acompress.c @@ -105,6 +105,7 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm) =20 acomp->compress =3D alg->compress; acomp->decompress =3D alg->decompress; + acomp->batch_completed =3D alg->batch_completed; acomp->reqsize =3D alg->base.cra_reqsize; =20 acomp->base.exit =3D crypto_acomp_exit_tfm; @@ -291,6 +292,65 @@ static __always_inline int acomp_do_req_chain(struct a= comp_req *req, bool comp) return acomp_reqchain_finish(req, err); } =20 +static int acomp_do_req_batch_parallel(struct acomp_req *req, bool comp) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(req); + unsigned long *bpwq_addr =3D acomp_request_ctx(req); + wait_queue_head_t batch_parallel_wq; + int ret; + + init_waitqueue_head(&batch_parallel_wq); + *bpwq_addr =3D (unsigned long)&batch_parallel_wq; + + ret =3D comp ? tfm->compress(req) : tfm->decompress(req); + + wait_event(batch_parallel_wq, tfm->batch_completed(req, comp)); + + if (req->slen < 0) + ret |=3D -EINVAL; + + return ret; +} + +/** + * Please note: + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + * + * 1) If @req->unit_size is 0, there is no impact to existing acomp users. + * + * 2) If @req->unit_size is non-0 (for e.g. zswap compress batching) and + * @req->src and @req->dst are scatterlists: + * + * a) Algorithms that do not support segmentation: + * + * We call acomp_do_req_chain() that handles the trivial case when + * the caller has passed exactly one segment. The dst SG list's leng= th is + * set to the compression error/compressed length for that segment. + * + * b) Algorithms that support segmentation: + * + * If the source length is more than @req->unit_size, + * acomp_do_req_batch_parallel() is invoked: this calls the tfm's + * compress() API, which uses the @req->unit_size being greater than + * @req->slen to ascertain that it needs to do batching. The algorit= hm's + * compress() implementation submits the batch's sub-requests for + * compression and returns. + * + * Algorithms that support batching must provide a batch_completed()= API. + * When the batch's compression sub-requests have completed, they mu= st + * notify a wait_queue using the batch_completed() API. The batching= tfm + * implementation must set the dst SG lists to contain the individual + * sub-requests' error/compressed lengths. + * + * If the source length =3D=3D @req->unit_size, the tfm's compress()= API is + * invoked. The assumption is that segmentation algorithms will inte= rnally + * set the dst SG list's length to indicate error/compressed length = in + * this case, similar to the batching case. + * + * 3) To prevent functional/performance regressions, we preserve existing + * behavior in all other cases, such as, when @req->unit_size is non-0 = and + * @req->src and/or @req->dst is virtual; instead of returning an error. + */ int crypto_acomp_compress(struct acomp_req *req) { struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(req); @@ -302,6 +362,9 @@ int crypto_acomp_compress(struct acomp_req *req) if (!crypto_acomp_req_seg(tfm)) return acomp_do_req_chain(req, true); =20 + if (likely((req->slen > req->unit_size) && tfm->batch_completed)) + return acomp_do_req_batch_parallel(req, true); + return tfm->compress(req); } =20 diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/i= aa/iaa_crypto.h index db83c21e92f1..d85a8f1cbb93 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -69,10 +69,13 @@ * IAA. In other words, don't make any assumptions, and protect * compression/decompression data. * + * @data: Driver internal data to interface with crypto_acomp. + * */ struct iaa_batch_ctx { struct iaa_req **reqs; struct mutex mutex; + void *data; }; =20 #define IAA_COMP_MODES_MAX IAA_MODE_NONE diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index 8d83a1ea15d7..915bf9b17b39 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -2524,6 +2524,71 @@ static void compression_ctx_init(struct iaa_compress= ion_ctx *ctx, enum iaa_mode * Interfaces to crypto_alg and crypto_acomp. *********************************************/ =20 +static __always_inline int iaa_crypto_acomp_acompress_batch( + struct iaa_compression_ctx *ctx, + struct iaa_req *parent_req, + struct iaa_req **reqs, + unsigned int unit_size) +{ + int nr_reqs =3D parent_req->slen / unit_size; + + return iaa_comp_submit_acompress_batch(ctx, parent_req, reqs, nr_reqs, un= it_size); +} + +static __always_inline int iaa_crypto_acomp_adecompress_batch( + struct iaa_compression_ctx *ctx, + struct iaa_req *parent_req, + struct iaa_req **reqs, + unsigned int unit_size) +{ + int nr_reqs =3D parent_req->dlen / unit_size; + + return iaa_comp_submit_adecompress_batch(ctx, parent_req, reqs, nr_reqs); +} + +static bool iaa_crypto_acomp_batch_completed(struct acomp_req *areq, bool = comp) +{ + unsigned long *cpu_ctx_addr =3D acomp_request_ctx(areq); + struct iaa_batch_ctx *cpu_ctx =3D (struct iaa_batch_ctx *)*cpu_ctx_addr; + wait_queue_head_t *batch_parallel_wq =3D (wait_queue_head_t *)cpu_ctx->da= ta; + struct iaa_req **reqs =3D cpu_ctx->reqs; + int nr_reqs =3D (comp ? areq->slen : areq->dlen) / areq->unit_size; + + /* + * Since both, compress and decompress require the eventual + * caller (zswap) to verify @areq->dlen, we use @areq->slen to + * flag the batch's success/error to crypto_acomp, which will + * return this as the @err status to the crypto_acomp callback + * function. + */ + if (iaa_comp_batch_completed(NULL, reqs, nr_reqs)) + areq->slen =3D -EINVAL; + + /* + * Set the acomp_req's dlen to be the first SG list's + * compressed/decompressed length/error value to enable zswap code + * equivalence for non-batching and batching acomp_algs. + */ + areq->dlen =3D areq->dst->length; + + /* All sub-requests have finished. Notify the @batch_parallel_wq. */ + if (waitqueue_active(batch_parallel_wq)) + wake_up(batch_parallel_wq); + + mutex_unlock(&cpu_ctx->mutex); + + return true; +} + +/* + * Main compression API for kernel users of crypto_acomp, such as zswap. + * + * crypto_acomp_compress() calls into this procedure for: + * - Sequential compression of a single page, + * - Parallel batch compression of multiple pages. + * + * @areq: asynchronous compress request + */ static int iaa_crypto_acomp_acompress_main(struct acomp_req *areq) { struct crypto_tfm *tfm =3D areq->base.tfm; @@ -2534,14 +2599,47 @@ static int iaa_crypto_acomp_acompress_main(struct a= comp_req *areq) if (iaa_alg_is_registered(crypto_tfm_alg_driver_name(tfm), &idx)) { ctx =3D iaa_ctx[idx]; =20 - acomp_to_iaa(areq, &parent_req, ctx); - ret =3D iaa_comp_acompress(ctx, &parent_req); - iaa_to_acomp(unlikely(ret) ? ret : parent_req.dlen, areq); + if (likely(areq->slen =3D=3D areq->unit_size) || !areq->unit_size) { + acomp_to_iaa(areq, &parent_req, ctx); + ret =3D iaa_comp_acompress(ctx, &parent_req); + iaa_to_acomp(unlikely(ret) ? ret : parent_req.dlen, areq); + } else { + struct iaa_batch_ctx *cpu_ctx =3D raw_cpu_ptr(iaa_batch_ctx); + struct iaa_req **reqs; + unsigned long *cpu_ctx_addr, *bpwq_addr; + + acomp_to_iaa(areq, &parent_req, ctx); + + mutex_lock(&cpu_ctx->mutex); + + bpwq_addr =3D acomp_request_ctx(areq); + /* Save the wait_queue_head. */ + cpu_ctx->data =3D (wait_queue_head_t *)*bpwq_addr; + + reqs =3D cpu_ctx->reqs; + + ret =3D iaa_crypto_acomp_acompress_batch(ctx, + &parent_req, + reqs, + areq->unit_size); + + cpu_ctx_addr =3D acomp_request_ctx(areq); + *cpu_ctx_addr =3D (unsigned long)cpu_ctx; + } } =20 return ret; } =20 +/* + * Main decompression API for kernel users of crypto_acomp, such as zswap. + * + * crypto_acomp_decompress() calls into this procedure for: + * - Sequential decompression of a single buffer, + * - Parallel batch decompression of multiple buffers. + * + * @areq: asynchronous decompress request + */ static int iaa_crypto_acomp_adecompress_main(struct acomp_req *areq) { struct crypto_tfm *tfm =3D areq->base.tfm; @@ -2552,9 +2650,33 @@ static int iaa_crypto_acomp_adecompress_main(struct = acomp_req *areq) if (iaa_alg_is_registered(crypto_tfm_alg_driver_name(tfm), &idx)) { ctx =3D iaa_ctx[idx]; =20 - acomp_to_iaa(areq, &parent_req, ctx); - ret =3D iaa_comp_adecompress(ctx, &parent_req); - iaa_to_acomp(parent_req.dlen, areq); + if (likely(areq->dlen =3D=3D areq->unit_size) || !areq->unit_size) { + acomp_to_iaa(areq, &parent_req, ctx); + ret =3D iaa_comp_adecompress(ctx, &parent_req); + iaa_to_acomp(parent_req.dlen, areq); + } else { + struct iaa_batch_ctx *cpu_ctx =3D raw_cpu_ptr(iaa_batch_ctx); + struct iaa_req **reqs; + unsigned long *cpu_ctx_addr, *bpwq_addr; + + acomp_to_iaa(areq, &parent_req, ctx); + + mutex_lock(&cpu_ctx->mutex); + + bpwq_addr =3D acomp_request_ctx(areq); + /* Save the wait_queue_head. */ + cpu_ctx->data =3D (wait_queue_head_t *)*bpwq_addr; + + reqs =3D cpu_ctx->reqs; + + ret =3D iaa_crypto_acomp_adecompress_batch(ctx, + &parent_req, + reqs, + areq->unit_size); + + cpu_ctx_addr =3D acomp_request_ctx(areq); + *cpu_ctx_addr =3D (unsigned long)cpu_ctx; + } } =20 return ret; @@ -2574,10 +2696,11 @@ static struct acomp_alg iaa_acomp_fixed_deflate =3D= { .init =3D iaa_crypto_acomp_init_fixed, .compress =3D iaa_crypto_acomp_acompress_main, .decompress =3D iaa_crypto_acomp_adecompress_main, + .batch_completed =3D iaa_crypto_acomp_batch_completed, .base =3D { .cra_name =3D "deflate", .cra_driver_name =3D "deflate-iaa", - .cra_flags =3D CRYPTO_ALG_ASYNC, + .cra_flags =3D CRYPTO_ALG_ASYNC | CRYPTO_ALG_REQ_SEG, .cra_ctxsize =3D sizeof(struct iaa_compression_ctx), .cra_reqsize =3D sizeof(u32), .cra_module =3D THIS_MODULE, diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h index 86e4932cd112..752110a7719c 100644 --- a/include/crypto/acompress.h +++ b/include/crypto/acompress.h @@ -109,6 +109,12 @@ struct acomp_req { * * @compress: Function performs a compress operation * @decompress: Function performs a de-compress operation + * @batch_completed: Waits for batch completion of parallel + * compress/decompress requests submitted via + * @compress/@decompress. Returns bool status + * of all batch sub-requests having completed. + * Returns an error code in @req->slen if any + * of the sub-requests completed with an error. * @reqsize: Context size for (de)compression requests * @fb: Synchronous fallback tfm * @base: Common crypto API algorithm data structure @@ -116,6 +122,7 @@ struct acomp_req { struct crypto_acomp { int (*compress)(struct acomp_req *req); int (*decompress)(struct acomp_req *req); + bool (*batch_completed)(struct acomp_req *req, bool comp); unsigned int reqsize; struct crypto_tfm base; }; diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/= acompress.h index 366dbdb987e8..7c4e14491d59 100644 --- a/include/crypto/internal/acompress.h +++ b/include/crypto/internal/acompress.h @@ -28,6 +28,12 @@ * * @compress: Function performs a compress operation * @decompress: Function performs a de-compress operation + * @batch_completed: Waits for batch completion of parallel + * compress/decompress requests submitted via + * @compress/@decompress. Returns bool status + * of all batch sub-requests having completed. + * Returns an error code in @req->slen if any + * of the sub-requests completed with an error. * @init: Initialize the cryptographic transformation object. * This function is used to initialize the cryptographic * transformation object. This function is called only once at @@ -46,6 +52,7 @@ struct acomp_alg { int (*compress)(struct acomp_req *req); int (*decompress)(struct acomp_req *req); + bool (*batch_completed)(struct acomp_req *req, bool comp); int (*init)(struct crypto_acomp *tfm); void (*exit)(struct crypto_acomp *tfm); =20 --=20 2.27.0