From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A4EA1EBFF9; Sat, 21 Dec 2024 06:31:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762684; cv=none; b=fTdGrJj1Dg29DqIqBlww63z1B5/Xcp/Elp6hKJYErFNCt5XKD4pwxAg7see2ldIPdx4JfYMJJvI6LxXt6eC0PjFfi5Tu1TZ/FFq4hIRqwZQg/9Tussc64rPNwbVuf05EPYeO+ctjZ2Xi9WrBGKKLkyLbj125t8C5RcTk1+E1t+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762684; c=relaxed/simple; bh=GAptaIbchIxwIxRKd6dyx7Yd4Mfg9kgBdBXC6xe2nEA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=TfsVHXtH4jBW1BsXemK1Do5FFLwGts3YPqsI15qqpybuEhqLrNsmJ3aszcJxFJ7wqAsdLFEahUAh/OQzqPielAsTqhblDOD7rJtM3D6rVyKF0vZB4f3DwqSUlamuO0sZJdXVpcFqWooSfeu2C4yj5Rt1sMEvLNivuUDWzARYTW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RuqGt3qw; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RuqGt3qw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762682; x=1766298682; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GAptaIbchIxwIxRKd6dyx7Yd4Mfg9kgBdBXC6xe2nEA=; b=RuqGt3qwacr/uKRhM1QwrWOzudA51/vHneG0CeYb8Y/yEJedzxAMyeQG Ntlee91qOyYQTJfm17P8zD+TfuJ6Jkl8rPOX0MKaOHDaxnkzcUrbWWP5P z0cz8ob2YvAFYYhJPmzwTfa1B+qZE1KXC6FxA4UFiIv7Cj0P/YxShf2Gt TK6f2Nf+C64aUpTDBk0ZonyJO9nN9NzQIoyVqZZQCm55IdM3AwZf4pfS3 0oZH6b0HG70MCv+XfkwF+xZQ3zWZynWb5praRglEbsJsjZWVIVRlhTiwB N4hPy0q7fysm//iIQttkjka/Mlf8HLl3+dOlNzXIVEG2PRJvh3VzKBxIq g==; X-CSE-ConnectionGUID: dVMRFU/8RP2zvKieSYfVNQ== X-CSE-MsgGUID: YyI1tP6GQEOqZmlKX8ak5w== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021616" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021616" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:19 -0800 X-CSE-ConnectionGUID: GYDOmzRzSnWyfnodOTeO/w== X-CSE-MsgGUID: lC5AYj8BQGSYFnxYoRe+xw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184575" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:20 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 01/12] crypto: acomp - Add synchronous/asynchronous acomp request chaining. Date: Fri, 20 Dec 2024 22:31:08 -0800 Message-Id: <20241221063119.29140-2-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch is based on Herbert Xu's request chaining for ahash ("[PATCH 2/6] crypto: hash - Add request chaining API") [1]. The generic framework for request chaining that's provided in the ahash implementation has been used as reference to develop a similar synchronous request chaining framework for crypto_acomp. Furthermore, this commit develops an asynchronous request chaining framework and API that iaa_crypto can use for request chaining with parallelism, in order to fully benefit from Intel IAA's multiple compress/decompress engines in hardware. This allows us to gain significant latency improvements with IAA batching as compared to synchronous request chaining. Usage of acomp request chaining API: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Any crypto_acomp compressor can avail of request chaining as follows: by c= alling one of Step 1: Create request chain: Request 0 (the first req in the chain): void acomp_reqchain_init(struct acomp_req *req, u32 flags, crypto_completion_t compl, void *data); Subsequent requests: void acomp_request_chain(struct acomp_req *req, struct acomp_req *head); Step 2: Process the request chain using the specified compress/decompress "op": 2.a) Synchronous: the chain of requests is processed in series: int acomp_do_req_chain(struct acomp_req *req, int (*op)(struct acomp_req *req)); 2.b) Asynchronous: the chain of requests is processed in parallel using a submit-poll paradigm: int acomp_do_async_req_chain(struct acomp_req *req, int (*op_submit)(struct acomp_req *req), int (*op_poll)(struct acomp_req *req)); Request chaining will be used in subsequent patches to implement compress/decompress batching in the iaa_crypto driver for the two supported IAA driver sync_modes: sync_mode =3D 'sync' will use (2.a), sync_mode =3D 'async' will use (2.b). These files are directly re-used from [1] which is not yet merged: include/crypto/algapi.h include/linux/crypto.h Hence, I am adding Herbert as the co-developer of this acomp request chaining patch. [1]: https://lore.kernel.org/linux-crypto/677614fbdc70b31df2e26483c8d2cd151= 0c8af91.1730021644.git.herbert@gondor.apana.org.au/ Suggested-by: Herbert Xu Signed-off-by: Kanchana P Sridhar Co-developed-by: Herbert Xu Signed-off-by: --- crypto/acompress.c | 284 ++++++++++++++++++++++++++++ include/crypto/acompress.h | 41 ++++ include/crypto/algapi.h | 10 + include/crypto/internal/acompress.h | 10 + include/linux/crypto.h | 31 +++ 5 files changed, 376 insertions(+) diff --git a/crypto/acompress.c b/crypto/acompress.c index 6fdf0ff9f3c0..cb6444d09dd7 100644 --- a/crypto/acompress.c +++ b/crypto/acompress.c @@ -23,6 +23,19 @@ struct crypto_scomp; =20 static const struct crypto_type crypto_acomp_type; =20 +struct acomp_save_req_state { + struct list_head head; + struct acomp_req *req0; + struct acomp_req *cur; + int (*op)(struct acomp_req *req); + crypto_completion_t compl; + void *data; +}; + +static void acomp_reqchain_done(void *data, int err); +static int acomp_save_req(struct acomp_req *req, crypto_completion_t cplt); +static void acomp_restore_req(struct acomp_req *req); + static inline struct acomp_alg *__crypto_acomp_alg(struct crypto_alg *alg) { return container_of(alg, struct acomp_alg, calg.base); @@ -123,6 +136,277 @@ struct crypto_acomp *crypto_alloc_acomp_node(const ch= ar *alg_name, u32 type, } EXPORT_SYMBOL_GPL(crypto_alloc_acomp_node); =20 +static int acomp_save_req(struct acomp_req *req, crypto_completion_t cplt) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(req); + struct acomp_save_req_state *state; + gfp_t gfp; + u32 flags; + + if (!acomp_is_async(tfm)) + return 0; + + flags =3D acomp_request_flags(req); + gfp =3D (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? GFP_KERNEL : GFP_ATOMIC; + state =3D kmalloc(sizeof(*state), gfp); + if (!state) + return -ENOMEM; + + state->compl =3D req->base.complete; + state->data =3D req->base.data; + state->req0 =3D req; + + req->base.complete =3D cplt; + req->base.data =3D state; + + return 0; +} + +static void acomp_restore_req(struct acomp_req *req) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(req); + struct acomp_save_req_state *state; + + if (!acomp_is_async(tfm)) + return; + + state =3D req->base.data; + + req->base.complete =3D state->compl; + req->base.data =3D state->data; + kfree(state); +} + +static int acomp_reqchain_finish(struct acomp_save_req_state *state, + int err, u32 mask) +{ + struct acomp_req *req0 =3D state->req0; + struct acomp_req *req =3D state->cur; + struct acomp_req *n; + + req->base.err =3D err; + + if (req =3D=3D req0) + INIT_LIST_HEAD(&req->base.list); + else + list_add_tail(&req->base.list, &req0->base.list); + + list_for_each_entry_safe(req, n, &state->head, base.list) { + list_del_init(&req->base.list); + + req->base.flags &=3D mask; + req->base.complete =3D acomp_reqchain_done; + req->base.data =3D state; + state->cur =3D req; + err =3D state->op(req); + + if (err =3D=3D -EINPROGRESS) { + if (!list_empty(&state->head)) + err =3D -EBUSY; + goto out; + } + + if (err =3D=3D -EBUSY) + goto out; + + req->base.err =3D err; + list_add_tail(&req->base.list, &req0->base.list); + } + + acomp_restore_req(req0); + +out: + return err; +} + +static void acomp_reqchain_done(void *data, int err) +{ + struct acomp_save_req_state *state =3D data; + crypto_completion_t compl =3D state->compl; + + data =3D state->data; + + if (err =3D=3D -EINPROGRESS) { + if (!list_empty(&state->head)) + return; + goto notify; + } + + err =3D acomp_reqchain_finish(state, err, CRYPTO_TFM_REQ_MAY_BACKLOG); + if (err =3D=3D -EBUSY) + return; + +notify: + compl(data, err); +} + +int acomp_do_req_chain(struct acomp_req *req, + int (*op)(struct acomp_req *req)) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(req); + struct acomp_save_req_state *state; + struct acomp_save_req_state state0; + int err =3D 0; + + if (!acomp_request_chained(req) || list_empty(&req->base.list) || + !crypto_acomp_req_chain(tfm)) + return op(req); + + state =3D &state0; + + if (acomp_is_async(tfm)) { + err =3D acomp_save_req(req, acomp_reqchain_done); + if (err) { + struct acomp_req *r2; + + req->base.err =3D err; + list_for_each_entry(r2, &req->base.list, base.list) + r2->base.err =3D err; + + return err; + } + + state =3D req->base.data; + } + + state->op =3D op; + state->cur =3D req; + INIT_LIST_HEAD(&state->head); + list_splice(&req->base.list, &state->head); + + err =3D op(req); + if (err =3D=3D -EBUSY || err =3D=3D -EINPROGRESS) + return -EBUSY; + + return acomp_reqchain_finish(state, err, ~0); +} +EXPORT_SYMBOL_GPL(acomp_do_req_chain); + +static void acomp_async_reqchain_done(struct acomp_req *req0, + struct list_head *state, + int (*op_poll)(struct acomp_req *req)) +{ + struct acomp_req *req, *n; + bool req0_done =3D false; + int err; + + while (!list_empty(state)) { + + if (!req0_done) { + err =3D op_poll(req0); + if (!(err =3D=3D -EAGAIN || err =3D=3D -EINPROGRESS || err =3D=3D -EBUS= Y)) { + req0->base.err =3D err; + req0_done =3D true; + } + } + + list_for_each_entry_safe(req, n, state, base.list) { + err =3D op_poll(req); + + if (err =3D=3D -EAGAIN || err =3D=3D -EINPROGRESS || err =3D=3D -EBUSY) + continue; + + req->base.err =3D err; + list_del_init(&req->base.list); + list_add_tail(&req->base.list, &req0->base.list); + } + } + + while (!req0_done) { + err =3D op_poll(req0); + if (!(err =3D=3D -EAGAIN || err =3D=3D -EINPROGRESS || err =3D=3D -EBUSY= )) { + req0->base.err =3D err; + break; + } + } +} + +static int acomp_async_reqchain_finish(struct acomp_req *req0, + struct list_head *state, + int (*op_submit)(struct acomp_req *req), + int (*op_poll)(struct acomp_req *req)) +{ + struct acomp_req *req, *n; + int err =3D 0; + + INIT_LIST_HEAD(&req0->base.list); + + list_for_each_entry_safe(req, n, state, base.list) { + BUG_ON(req =3D=3D req0); + + err =3D op_submit(req); + + if (!(err =3D=3D -EINPROGRESS || err =3D=3D -EBUSY)) { + req->base.err =3D err; + list_del_init(&req->base.list); + list_add_tail(&req->base.list, &req0->base.list); + } + } + + acomp_async_reqchain_done(req0, state, op_poll); + + return req0->base.err; +} + +int acomp_do_async_req_chain(struct acomp_req *req, + int (*op_submit)(struct acomp_req *req), + int (*op_poll)(struct acomp_req *req)) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(req); + struct list_head state; + struct acomp_req *r2; + int err =3D 0; + void *req0_data =3D req->base.data; + + if (!acomp_request_chained(req) || list_empty(&req->base.list) || + !acomp_is_async(tfm) || !crypto_acomp_req_chain(tfm)) { + + err =3D op_submit(req); + + if (err =3D=3D -EINPROGRESS || err =3D=3D -EBUSY) { + bool req0_done =3D false; + + while (!req0_done) { + err =3D op_poll(req); + if (!(err =3D=3D -EAGAIN || err =3D=3D -EINPROGRESS || err =3D=3D -EBU= SY)) { + req->base.err =3D err; + break; + } + } + } else { + req->base.err =3D err; + } + + req->base.data =3D req0_data; + if (acomp_is_async(tfm)) + req->base.complete(req->base.data, req->base.err); + + return err; + } + + err =3D op_submit(req); + req->base.err =3D err; + + if (err && !(err =3D=3D -EINPROGRESS || err =3D=3D -EBUSY)) + goto err_prop; + + INIT_LIST_HEAD(&state); + list_splice(&req->base.list, &state); + + err =3D acomp_async_reqchain_finish(req, &state, op_submit, op_poll); + req->base.data =3D req0_data; + req->base.complete(req->base.data, req->base.err); + + return err; + +err_prop: + list_for_each_entry(r2, &req->base.list, base.list) + r2->base.err =3D err; + + return err; +} +EXPORT_SYMBOL_GPL(acomp_do_async_req_chain); + struct acomp_req *acomp_request_alloc(struct crypto_acomp *acomp) { struct crypto_tfm *tfm =3D crypto_acomp_tfm(acomp); diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h index 54937b615239..eadc24514056 100644 --- a/include/crypto/acompress.h +++ b/include/crypto/acompress.h @@ -206,6 +206,7 @@ static inline void acomp_request_set_callback(struct ac= omp_req *req, req->base.data =3D data; req->base.flags &=3D CRYPTO_ACOMP_ALLOC_OUTPUT; req->base.flags |=3D flgs & ~CRYPTO_ACOMP_ALLOC_OUTPUT; + req->base.flags &=3D ~CRYPTO_TFM_REQ_CHAIN; } =20 /** @@ -237,6 +238,46 @@ static inline void acomp_request_set_params(struct aco= mp_req *req, req->flags |=3D CRYPTO_ACOMP_ALLOC_OUTPUT; } =20 +static inline u32 acomp_request_flags(struct acomp_req *req) +{ + return req->base.flags; +} + +static inline void acomp_reqchain_init(struct acomp_req *req, + u32 flags, crypto_completion_t compl, + void *data) +{ + acomp_request_set_callback(req, flags, compl, data); + crypto_reqchain_init(&req->base); +} + +static inline void acomp_reqchain_clear(struct acomp_req *req, void *data) +{ + struct crypto_wait *wait =3D (struct crypto_wait *)data; + reinit_completion(&wait->completion); + crypto_reqchain_clear(&req->base); + acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, + crypto_req_done, data); +} + +static inline void acomp_request_chain(struct acomp_req *req, + struct acomp_req *head) +{ + crypto_request_chain(&req->base, &head->base); +} + +int acomp_do_req_chain(struct acomp_req *req, + int (*op)(struct acomp_req *req)); + +int acomp_do_async_req_chain(struct acomp_req *req, + int (*op_submit)(struct acomp_req *req), + int (*op_poll)(struct acomp_req *req)); + +static inline int acomp_request_err(struct acomp_req *req) +{ + return req->base.err; +} + /** * crypto_acomp_compress() -- Invoke asynchronous compress operation * diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index 156de41ca760..c5df380c7d08 100644 --- a/include/crypto/algapi.h +++ b/include/crypto/algapi.h @@ -271,4 +271,14 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tf= m *tfm) return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK; } =20 +static inline bool crypto_request_chained(struct crypto_async_request *req) +{ + return req->flags & CRYPTO_TFM_REQ_CHAIN; +} + +static inline bool crypto_tfm_req_chain(struct crypto_tfm *tfm) +{ + return tfm->__crt_alg->cra_flags & CRYPTO_ALG_REQ_CHAIN; +} + #endif /* _CRYPTO_ALGAPI_H */ diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/= acompress.h index 8831edaafc05..53b4ef59b48c 100644 --- a/include/crypto/internal/acompress.h +++ b/include/crypto/internal/acompress.h @@ -84,6 +84,16 @@ static inline void __acomp_request_free(struct acomp_req= *req) kfree_sensitive(req); } =20 +static inline bool acomp_request_chained(struct acomp_req *req) +{ + return crypto_request_chained(&req->base); +} + +static inline bool crypto_acomp_req_chain(struct crypto_acomp *tfm) +{ + return crypto_tfm_req_chain(&tfm->base); +} + /** * crypto_register_acomp() -- Register asynchronous compression algorithm * diff --git a/include/linux/crypto.h b/include/linux/crypto.h index b164da5e129e..7c27d557586c 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -13,6 +13,8 @@ #define _LINUX_CRYPTO_H =20 #include +#include +#include #include #include #include @@ -124,6 +126,9 @@ */ #define CRYPTO_ALG_FIPS_INTERNAL 0x00020000 =20 +/* Set if the algorithm supports request chains. */ +#define CRYPTO_ALG_REQ_CHAIN 0x00040000 + /* * Transform masks and values (for crt_flags). */ @@ -133,6 +138,7 @@ #define CRYPTO_TFM_REQ_FORBID_WEAK_KEYS 0x00000100 #define CRYPTO_TFM_REQ_MAY_SLEEP 0x00000200 #define CRYPTO_TFM_REQ_MAY_BACKLOG 0x00000400 +#define CRYPTO_TFM_REQ_CHAIN 0x00000800 =20 /* * Miscellaneous stuff. @@ -174,6 +180,7 @@ struct crypto_async_request { struct crypto_tfm *tfm; =20 u32 flags; + int err; }; =20 /** @@ -540,5 +547,29 @@ int crypto_comp_decompress(struct crypto_comp *tfm, const u8 *src, unsigned int slen, u8 *dst, unsigned int *dlen); =20 +static inline void crypto_reqchain_init(struct crypto_async_request *req) +{ + req->err =3D -EINPROGRESS; + req->flags |=3D CRYPTO_TFM_REQ_CHAIN; + INIT_LIST_HEAD(&req->list); +} + +static inline void crypto_reqchain_clear(struct crypto_async_request *req) +{ + req->flags &=3D ~CRYPTO_TFM_REQ_CHAIN; +} + +static inline void crypto_request_chain(struct crypto_async_request *req, + struct crypto_async_request *head) +{ + req->err =3D -EINPROGRESS; + list_add_tail(&req->list, &head->list); +} + +static inline bool crypto_tfm_is_async(struct crypto_tfm *tfm) +{ + return tfm->__crt_alg->cra_flags & CRYPTO_ALG_ASYNC; +} + #endif /* _LINUX_CRYPTO_H */ =20 --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AC4E1EC4D3; Sat, 21 Dec 2024 06:31:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762685; cv=none; b=ciHIpPFs3ivToRA0T2jjZXEAe2wPbixnmuWaJkBDufdl3RG8JTXyZE5lR7OJrAjQ2h/9xzh07aSkCPE4iDRUWlR5giQB6HHXEnhsjKpdj2OngZvp4ySLhcHTUTpSPVDkw1WzAV+/n11tp4thNBmZnSjq+yG0MpfDaVFpPpDR6UE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762685; c=relaxed/simple; bh=+lL68B5VDme0ssy20KVs9HZUu1uuo/887sCYSoPmBdw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dLyfE7GiNqKLeKTfk53w4G3GWV0MYSHjTIAbXU6jX8iprydGjxPbXOQHXPTZDk5u8WQYLhbS+rKHgRjBlH5dUIvY4GQF1JDwvgvVioKHTRrrucOT1oAh3YmrUYAJKdLyKkPM6iwD2S2EuLARs6W8HYOjn/snmakUsH72WEl4Qnw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DVwM7O7y; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DVwM7O7y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762683; x=1766298683; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+lL68B5VDme0ssy20KVs9HZUu1uuo/887sCYSoPmBdw=; b=DVwM7O7yIaWDzhwBELsXyPGPeRxuWT4Z1EEeQaF2gm1+pWYMm12JVJWA AZkqRY32jtkOlg+xuIxRJ+FUH6Ub6DeThe+xVjASMgj8wZVyHbYdGyeZI duF7jsHYjhrFEUMd6GewN5w/WVxWrQSqbuK4lbMyixGdxfgWkrPz8OTxf FM10ckuSd/93tXY7R97RZA/O7C8cl+af39A2y2mrL5eWbkwvMU29nBckf ivZI4ut6153q4zRDG14lhAfjGmCJxW8kiRyC756ixZLfAETPR3N40yKP2 pXD3Dgcb6xBRlnx83n/LPV3sDQ/dk9WPpAKWVFqmyIuPtUy6ObL4uNpAj Q==; X-CSE-ConnectionGUID: 2XaK23BGSEq5S+T9g9btyg== X-CSE-MsgGUID: bo1KG17AQCq2MAEEvjg5EQ== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021628" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021628" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: 6HJLpaqhT0G57N/dxD4MSQ== X-CSE-MsgGUID: rOfRcOvhRUKUfVOelk1HHQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184578" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:20 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 02/12] crypto: acomp - Define new interfaces for compress/decompress batching. Date: Fri, 20 Dec 2024 22:31:09 -0800 Message-Id: <20241221063119.29140-3-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit adds get_batch_size(), batch_compress() and batch_decompress() interfaces to: struct acomp_alg struct crypto_acomp A crypto_acomp compression algorithm that supports batching of compressions and decompressions must provide implementations for these API. A new helper function acomp_has_async_batching() can be invoked to query if a crypto_acomp has registered these batching interfaces. A higher level module like zswap can call acomp_has_async_batching() to detect if the compressor supports batching, and if so, it can call the new crypto_acomp_batch_size() to detect the maximum batch-size supported by the compressor. Based on this, zswap can use the minimum of any zswap-specific upper limits for batch-size and the compressor's max batch-size, to allocate batching resources. This allows the iaa_crypto Intel IAA driver to register implementations for the get_batch_size(), batch_compress() and batch_decompress() acomp_alg interfaces, that can subsequently be invoked from the kernel zswap/zram modules to compress/decompress pages in parallel in the IAA hardware accelerator to improve swapout/swapin performance through these newly added corresponding crypto_acomp API: crypto_acomp_batch_size() crypto_acomp_batch_compress() crypto_acomp_batch_decompress() Signed-off-by: Kanchana P Sridhar --- crypto/acompress.c | 3 + include/crypto/acompress.h | 111 ++++++++++++++++++++++++++++ include/crypto/internal/acompress.h | 19 +++++ 3 files changed, 133 insertions(+) diff --git a/crypto/acompress.c b/crypto/acompress.c index cb6444d09dd7..165559a8b9bd 100644 --- a/crypto/acompress.c +++ b/crypto/acompress.c @@ -84,6 +84,9 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm) =20 acomp->compress =3D alg->compress; acomp->decompress =3D alg->decompress; + acomp->get_batch_size =3D alg->get_batch_size; + acomp->batch_compress =3D alg->batch_compress; + acomp->batch_decompress =3D alg->batch_decompress; acomp->dst_free =3D alg->dst_free; acomp->reqsize =3D alg->reqsize; =20 diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h index eadc24514056..8451ade70fd8 100644 --- a/include/crypto/acompress.h +++ b/include/crypto/acompress.h @@ -43,6 +43,10 @@ struct acomp_req { * * @compress: Function performs a compress operation * @decompress: Function performs a de-compress operation + * @get_batch_size: Maximum batch-size for batching compress/decompress + * operations. + * @batch_compress: Function performs a batch compress operation + * @batch_decompress: Function performs a batch decompress operation * @dst_free: Frees destination buffer if allocated inside the * algorithm * @reqsize: Context size for (de)compression requests @@ -51,6 +55,21 @@ struct acomp_req { struct crypto_acomp { int (*compress)(struct acomp_req *req); int (*decompress)(struct acomp_req *req); + unsigned int (*get_batch_size)(void); + bool (*batch_compress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages); + bool (*batch_decompress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages); void (*dst_free)(struct scatterlist *dst); unsigned int reqsize; struct crypto_tfm base; @@ -142,6 +161,13 @@ static inline bool acomp_is_async(struct crypto_acomp = *tfm) CRYPTO_ALG_ASYNC; } =20 +static inline bool acomp_has_async_batching(struct crypto_acomp *tfm) +{ + return (acomp_is_async(tfm) && + (crypto_comp_alg_common(tfm)->base.cra_flags & CRYPTO_ALG_TYPE_ACOMPRESS= ) && + tfm->get_batch_size && tfm->batch_compress && tfm->batch_decompress); +} + static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *r= eq) { return __crypto_acomp_tfm(req->base.tfm); @@ -306,4 +332,89 @@ static inline int crypto_acomp_decompress(struct acomp= _req *req) return crypto_acomp_reqtfm(req)->decompress(req); } =20 +/** + * crypto_acomp_batch_size() -- Get the algorithm's batch size + * + * Function returns the algorithm's batch size for batching operations + * + * @tfm: ACOMPRESS tfm handle allocated with crypto_alloc_acomp() + * + * Return: crypto_acomp's batch size. + */ +static inline unsigned int crypto_acomp_batch_size(struct crypto_acomp *tf= m) +{ + if (acomp_has_async_batching(tfm)) + return tfm->get_batch_size(); + + return 1; +} + +/** + * crypto_acomp_batch_compress() -- Invoke asynchronous compress of + * a batch of requests + * + * Function invokes the asynchronous batch compress operation + * + * @reqs: @nr_pages asynchronous compress requests. + * @wait: crypto_wait for acomp batch compress with synchronous/asynchrono= us + * request chaining. If NULL, the driver must provide a way to proc= ess + * request completions asynchronously. + * @pages: Pages to be compressed. + * @dsts: Pre-allocated destination buffers to store results of compressio= n. + * @dlens: Will contain the compressed lengths. + * @errors: zero on successful compression of the corresponding + * req, or error code in case of error. + * @nr_pages: The number of pages to be compressed. + * + * Returns true if all compress requests complete successfully, + * false otherwise. + */ +static inline bool crypto_acomp_batch_compress(struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(reqs[0]); + + return tfm->batch_compress(reqs, wait, pages, dsts, + dlens, errors, nr_pages); +} + +/** + * crypto_acomp_batch_decompress() -- Invoke asynchronous decompress of + * a batch of requests + * + * Function invokes the asynchronous batch decompress operation + * + * @reqs: @nr_pages asynchronous decompress requests. + * @wait: crypto_wait for acomp batch decompress with synchronous/asynchro= nous + * request chaining. If NULL, the driver must provide a way to proc= ess + * request completions asynchronously. + * @srcs: The src buffers to be decompressed. + * @pages: The pages to store the decompressed buffers. + * @slens: Compressed lengths of @srcs. + * @errors: zero on successful compression of the corresponding + * req, or error code in case of error. + * @nr_pages: The number of pages to be decompressed. + * + * Returns true if all decompress requests complete successfully, + * false otherwise. + */ +static inline bool crypto_acomp_batch_decompress(struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages) +{ + struct crypto_acomp *tfm =3D crypto_acomp_reqtfm(reqs[0]); + + return tfm->batch_decompress(reqs, wait, srcs, pages, + slens, errors, nr_pages); +} + #endif diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/= acompress.h index 53b4ef59b48c..df0e192801ff 100644 --- a/include/crypto/internal/acompress.h +++ b/include/crypto/internal/acompress.h @@ -17,6 +17,10 @@ * * @compress: Function performs a compress operation * @decompress: Function performs a de-compress operation + * @get_batch_size: Maximum batch-size for batching compress/decompress + * operations. + * @batch_compress: Function performs a batch compress operation + * @batch_decompress: Function performs a batch decompress operation * @dst_free: Frees destination buffer if allocated inside the algorithm * @init: Initialize the cryptographic transformation object. * This function is used to initialize the cryptographic @@ -37,6 +41,21 @@ struct acomp_alg { int (*compress)(struct acomp_req *req); int (*decompress)(struct acomp_req *req); + unsigned int (*get_batch_size)(void); + bool (*batch_compress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages); + bool (*batch_decompress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages); void (*dst_free)(struct scatterlist *dst); int (*init)(struct crypto_acomp *tfm); void (*exit)(struct crypto_acomp *tfm); --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 432B51EC4D2; Sat, 21 Dec 2024 06:31:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762684; cv=none; b=WhCVa0aumuKXOGagHb9wTgDse4+lJUFkK6TwZG4PKgwJRt/K3U8GXAlYMFQa88vNUJDMrNE/VQjWMMEeJLteEMlHHXd7zwYB5vTw+u3Z8ntzL3LYqIBxJ/+iMVcExZcVJ5oBPbDQfeNgas4ffu24PWZ97h13Bi0EFRTNZjEwlbA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762684; c=relaxed/simple; bh=kq/8cpMYX4ePi7N8tA49JxZ+4FRLNlknFPm5kH1FrRw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=atm22Eitu3/epRHhO9lQ6/b/GZjQCFp+BAklYsgqOahCnU8rW5hqanUSpt7Y+kViez0oJN7iq60hq6td/zEWWNoCIbBQhNmO/nIjfDYXXpbprfZuM7ttPwaDEka5LZoLOdKe+FwneOesn+cG2aEAeC/3eZiIV8SYkVHX/dogW9M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=a5L49fc0; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="a5L49fc0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762683; x=1766298683; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kq/8cpMYX4ePi7N8tA49JxZ+4FRLNlknFPm5kH1FrRw=; b=a5L49fc02AFveAy3V7+L7QXTN1BTuCUEryKU236Ht6KPC0+cYzRC3HTw 8zy/J7w0ipH7Uaid9CUZgbvfkSlfW//eQXrqBURj2FfqdDMt/oOunDDX1 xwrUE+GjbQb12RmOWdpnoeYKqISq5ss6LYbb+f7fT7VFvwio+YpcuFyKY 1dUxjPM5HtdHzf0Wp8TZH4F7LYlWNzJbPf0VFqjg2rC4Q2aQ5U69GQ0+9 yiptrShBUf7wDuDt7QPSWyNhpVNW8QG0x5syqPuKlabu3igsBvlDpH5Bh 5T67ZuwzGETYVnMBOaP39J+Mv56jN9p1cptw0Gz29piKvIV7L0Qj7CkcY A==; X-CSE-ConnectionGUID: XzIttB0wSrydLik76Mf2nA== X-CSE-MsgGUID: iJsv+bf+TOSbYDn4COR9Gw== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021640" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021640" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: TWLbik5QSQ2kAGkr571m+g== X-CSE-MsgGUID: A6XjNyo5To2v5JmsY9R0Jw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184582" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:20 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 03/12] crypto: iaa - Add an acomp_req flag CRYPTO_ACOMP_REQ_POLL to enable async mode. Date: Fri, 20 Dec 2024 22:31:10 -0800 Message-Id: <20241221063119.29140-4-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If the iaa_crypto driver has async_mode set to true, and use_irq set to false, it can still be forced to use synchronous mode by turning off the CRYPTO_ACOMP_REQ_POLL flag in req->flags. In other words, all three of the following need to be true for a request to be processed in fully async poll mode: 1) async_mode should be "true" 2) use_irq should be "false" 3) req->flags & CRYPTO_ACOMP_REQ_POLL should be "true" Suggested-by: Herbert Xu Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto_main.c | 11 ++++++++++- include/crypto/acompress.h | 5 +++++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index 9e557649e5d0..29d03df39fab 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -1520,6 +1520,10 @@ static int iaa_comp_acompress(struct acomp_req *req) return -EINVAL; } =20 + /* If the caller has requested no polling, disable async. */ + if (!(req->flags & CRYPTO_ACOMP_REQ_POLL)) + disable_async =3D true; + cpu =3D get_cpu(); wq =3D wq_table_next_wq(cpu); put_cpu(); @@ -1712,6 +1716,7 @@ static int iaa_comp_adecompress(struct acomp_req *req) { struct crypto_tfm *tfm =3D req->base.tfm; dma_addr_t src_addr, dst_addr; + bool disable_async =3D false; int nr_sgs, cpu, ret =3D 0; struct iaa_wq *iaa_wq; struct device *dev; @@ -1727,6 +1732,10 @@ static int iaa_comp_adecompress(struct acomp_req *re= q) return -EINVAL; } =20 + /* If the caller has requested no polling, disable async. */ + if (!(req->flags & CRYPTO_ACOMP_REQ_POLL)) + disable_async =3D true; + if (!req->dst) return iaa_comp_adecompress_alloc_dest(req); =20 @@ -1775,7 +1784,7 @@ static int iaa_comp_adecompress(struct acomp_req *req) req->dst, req->dlen, sg_dma_len(req->dst)); =20 ret =3D iaa_decompress(tfm, req, wq, src_addr, req->slen, - dst_addr, &req->dlen, false); + dst_addr, &req->dlen, disable_async); if (ret =3D=3D -EINPROGRESS) return ret; =20 diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h index 8451ade70fd8..e090538e8406 100644 --- a/include/crypto/acompress.h +++ b/include/crypto/acompress.h @@ -14,6 +14,11 @@ #include =20 #define CRYPTO_ACOMP_ALLOC_OUTPUT 0x00000001 +/* + * If set, the driver must have a way to submit the req, then + * poll its completion status for success/error. + */ +#define CRYPTO_ACOMP_REQ_POLL 0x00000002 #define CRYPTO_ACOMP_DST_MAX 131072 =20 /** --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A0991EC4E2; Sat, 21 Dec 2024 06:31:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762686; cv=none; b=PA4VNgCIYZWcIa1CjaTnjAen5DYyC9lQhzFYpf0KYG56Ap6gdYqED7hqAJ1Qo9w1MJRPpjNAAuJPKM0pE23HI39USXCZMde2vSjHYM5XS3NJ8xh+ySwbVnGZXJDdZM/ayoR3aGW8HUYp7lTOdEILg4W+z4kOabR/dWx9ILbtzP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762686; c=relaxed/simple; bh=Ybj/ayHmpbuA6aY9ACaZeRej9qfykY7Qa0yAfnK5XSM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OrtX3oAJUG4B8E0GosXNIVPsYh6bDKcRn3kHRIMPPd2352lSKx393i2NmCiEImnrMS3jG5wd3eV0mE5bdEBCTS7bOoMfh0SB0yPvotLPzUDbbGgsN0zPSoqLDg23zEqK8ny4kg+K75U6rrsK8AybfndYXqgVs8XyivklljJ3fOU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DgKQXGJl; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DgKQXGJl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762684; x=1766298684; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ybj/ayHmpbuA6aY9ACaZeRej9qfykY7Qa0yAfnK5XSM=; b=DgKQXGJlEYdXAdKL9ZaDAjqowyR5cwR7B7yqXh4SswXd4MNOqTTF4JC6 /p7Od/8varMzVwgEroigd5bjxFkc/lY8P8Y7CVVElbDi1v3bxIqkWaDtO sZlRGs9PWEBqnWNb/be/fRHEkHbm9WQVAYbHvGyA30Rz/1lIg7pMGgP3K wwSezWqhqu90Xjt4fCS5+/dnkkugJwPfT2IyGIMUQEAmHGL4qUldKpmJ0 +nbLlByRccAjQL7GYFF1ZB4vePZDFkx/bErZ5NVxkCjdJ/P5/NJMdN6mN WFL6L131+ZA4ECNZR1hLuFQMnnY3EArvbzDd+QEYICqirLFqQ0N+ZunFR Q==; X-CSE-ConnectionGUID: cbUk5JOzRsK/U3EENKbOkQ== X-CSE-MsgGUID: EfrqgKjoQT+/ECSd7SzqJQ== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021651" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021651" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: CeK+YvJ2QtOf6+gzycQ7nA== X-CSE-MsgGUID: GItpzuP8SdWDHnGWUeqQQA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184585" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:20 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 04/12] crypto: iaa - Implement batch_compress(), batch_decompress() API in iaa_crypto. Date: Fri, 20 Dec 2024 22:31:11 -0800 Message-Id: <20241221063119.29140-5-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch provides iaa_crypto driver implementations for the newly added crypto_acomp batch_compress() and batch_decompress() interfaces using acomp request chaining. iaa_crypto also implements the new crypto_acomp get_batch_size() interface that returns an iaa_driver specific constant, IAA_CRYPTO_MAX_BATCH_SIZE (set to 8U currently). This allows swap modules such as zswap/zram to allocate required batching resources and then invoke fully asynchronous batch parallel compression/decompression of pages on systems with Intel IAA, by invoking these API, respectively: crypto_acomp_batch_size(...); crypto_acomp_batch_compress(...); crypto_acomp_batch_decompress(...); This enables zswap compress batching code to be developed in a manner similar to the current single-page synchronous calls to: crypto_acomp_compress(...); crypto_acomp_decompress(...); thereby, facilitating encapsulated and modular hand-off between the kernel zswap/zram code and the crypto_acomp layer. Since iaa_crypto supports the use of acomp request chaining, this patch also adds CRYPTO_ALG_REQ_CHAIN to the iaa_acomp_fixed_deflate algorithm's cra_flags. Suggested-by: Yosry Ahmed Suggested-by: Herbert Xu Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto.h | 9 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 395 ++++++++++++++++++++- 2 files changed, 403 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/i= aa/iaa_crypto.h index 56985e395263..b3b67c44ec8a 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -39,6 +39,15 @@ IAA_DECOMP_CHECK_FOR_EOB | \ IAA_DECOMP_STOP_ON_EOB) =20 +/* + * The maximum compress/decompress batch size for IAA's implementation of + * the crypto_acomp batch_compress() and batch_decompress() interfaces. + * The IAA compression algorithms should provide the crypto_acomp + * get_batch_size() interface through a function that returns this + * constant. + */ +#define IAA_CRYPTO_MAX_BATCH_SIZE 8U + /* Representation of IAA workqueue */ struct iaa_wq { struct list_head list; diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index 29d03df39fab..b51b0b4b9ac3 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -1807,6 +1807,396 @@ static void compression_ctx_init(struct iaa_compres= sion_ctx *ctx) ctx->use_irq =3D use_irq; } =20 +static int iaa_comp_poll(struct acomp_req *req) +{ + struct idxd_desc *idxd_desc; + struct idxd_device *idxd; + struct iaa_wq *iaa_wq; + struct pci_dev *pdev; + struct device *dev; + struct idxd_wq *wq; + bool compress_op; + int ret; + + idxd_desc =3D req->base.data; + if (!idxd_desc) + return -EAGAIN; + + compress_op =3D (idxd_desc->iax_hw->opcode =3D=3D IAX_OPCODE_COMPRESS); + wq =3D idxd_desc->wq; + iaa_wq =3D idxd_wq_get_private(wq); + idxd =3D iaa_wq->iaa_device->idxd; + pdev =3D idxd->pdev; + dev =3D &pdev->dev; + + ret =3D check_completion(dev, idxd_desc->iax_completion, true, true); + if (ret =3D=3D -EAGAIN) + return ret; + if (ret) + goto out; + + req->dlen =3D idxd_desc->iax_completion->output_size; + + /* Update stats */ + if (compress_op) { + update_total_comp_bytes_out(req->dlen); + update_wq_comp_bytes(wq, req->dlen); + } else { + update_total_decomp_bytes_in(req->slen); + update_wq_decomp_bytes(wq, req->slen); + } + + if (iaa_verify_compress && (idxd_desc->iax_hw->opcode =3D=3D IAX_OPCODE_C= OMPRESS)) { + struct crypto_tfm *tfm =3D req->base.tfm; + dma_addr_t src_addr, dst_addr; + u32 compression_crc; + + compression_crc =3D idxd_desc->iax_completion->crc; + + dma_sync_sg_for_device(dev, req->dst, 1, DMA_FROM_DEVICE); + dma_sync_sg_for_device(dev, req->src, 1, DMA_TO_DEVICE); + + src_addr =3D sg_dma_address(req->src); + dst_addr =3D sg_dma_address(req->dst); + + ret =3D iaa_compress_verify(tfm, req, wq, src_addr, req->slen, + dst_addr, &req->dlen, compression_crc); + } +out: + /* caller doesn't call crypto_wait_req, so no acomp_request_complete() */ + + dma_unmap_sg(dev, req->dst, sg_nents(req->dst), DMA_FROM_DEVICE); + dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_TO_DEVICE); + + idxd_free_desc(idxd_desc->wq, idxd_desc); + + dev_dbg(dev, "%s: returning ret=3D%d\n", __func__, ret); + + return ret; +} + +static unsigned int iaa_comp_get_batch_size(void) +{ + return IAA_CRYPTO_MAX_BATCH_SIZE; +} + +static void iaa_set_req_poll( + struct acomp_req *reqs[], + int nr_reqs, + bool set_flag) +{ + int i; + + for (i =3D 0; i < nr_reqs; ++i) { + set_flag ? (reqs[i]->flags |=3D CRYPTO_ACOMP_REQ_POLL) : + (reqs[i]->flags &=3D ~CRYPTO_ACOMP_REQ_POLL); + } +} + +/** + * This API provides IAA compress batching functionality for use by swap + * modules. + * + * @reqs: @nr_pages asynchronous compress requests. + * @wait: crypto_wait for acomp batch compress implemented using request + * chaining. Required if async_mode is "false". If async_mode is "t= rue", + * and @wait is NULL, the completions will be processed using + * asynchronous polling of the requests' completion statuses. + * @pages: Pages to be compressed by IAA. + * @dsts: Pre-allocated destination buffers to store results of IAA + * compression. Each element of @dsts must be of size "PAGE_SIZE * = 2". + * @dlens: Will contain the compressed lengths. + * @errors: zero on successful compression of the corresponding + * req, or error code in case of error. + * @nr_pages: The number of pages, up to IAA_CRYPTO_MAX_BATCH_SIZE, + * to be compressed. + * + * Returns true if all compress requests complete successfully, + * false otherwise. + */ +static bool iaa_comp_acompress_batch( + struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages) +{ + struct scatterlist inputs[IAA_CRYPTO_MAX_BATCH_SIZE]; + struct scatterlist outputs[IAA_CRYPTO_MAX_BATCH_SIZE]; + bool compressions_done =3D false; + bool async =3D (async_mode && !use_irq); + bool async_poll =3D (async && !wait); + int i, err =3D 0; + + BUG_ON(nr_pages > IAA_CRYPTO_MAX_BATCH_SIZE); + BUG_ON(!async && !wait); + + if (async) + iaa_set_req_poll(reqs, nr_pages, true); + else + iaa_set_req_poll(reqs, nr_pages, false); + + /* + * Prepare and submit acomp_reqs to IAA. IAA will process these + * compress jobs in parallel if async_mode is true. + */ + for (i =3D 0; i < nr_pages; ++i) { + sg_init_table(&inputs[i], 1); + sg_set_page(&inputs[i], pages[i], PAGE_SIZE, 0); + + /* + * Each dst buffer should be of size (PAGE_SIZE * 2). + * Reflect same in sg_list. + */ + sg_init_one(&outputs[i], dsts[i], PAGE_SIZE * 2); + acomp_request_set_params(reqs[i], &inputs[i], + &outputs[i], PAGE_SIZE, dlens[i]); + + /* + * As long as the API is called with a valid "wait", chain the + * requests for synchronous/asynchronous compress ops. + * If async_mode is in effect, but the API is called with a + * NULL "wait", submit the requests first, and poll for + * their completion status later, after all descriptors have + * been submitted. + */ + if (!async_poll) { + /* acomp request chaining. */ + if (i) + acomp_request_chain(reqs[i], reqs[0]); + else + acomp_reqchain_init(reqs[0], 0, crypto_req_done, + wait); + } else { + errors[i] =3D iaa_comp_acompress(reqs[i]); + + if (errors[i] !=3D -EINPROGRESS) { + errors[i] =3D -EINVAL; + err =3D -EINVAL; + } else { + errors[i] =3D -EAGAIN; + } + } + } + + if (!async_poll) { + if (async) + /* Process the request chain in parallel. */ + err =3D crypto_wait_req(acomp_do_async_req_chain(reqs[0], + iaa_comp_acompress, iaa_comp_poll), + wait); + else + /* Process the request chain in series. */ + err =3D crypto_wait_req(acomp_do_req_chain(reqs[0], + iaa_comp_acompress), wait); + + for (i =3D 0; i < nr_pages; ++i) { + errors[i] =3D acomp_request_err(reqs[i]); + if (errors[i]) { + err =3D -EINVAL; + pr_debug("Request chaining req %d compress error %d\n", i, errors[i]); + } else { + dlens[i] =3D reqs[i]->dlen; + } + } + + goto reset_reqs; + } + + /* + * Asynchronously poll for and process IAA compress job completions. + */ + while (!compressions_done) { + compressions_done =3D true; + + for (i =3D 0; i < nr_pages; ++i) { + /* + * Skip, if the compression has already completed + * successfully or with an error. + */ + if (errors[i] !=3D -EAGAIN) + continue; + + errors[i] =3D iaa_comp_poll(reqs[i]); + + if (errors[i]) { + if (errors[i] =3D=3D -EAGAIN) + compressions_done =3D false; + else + err =3D -EINVAL; + } else { + dlens[i] =3D reqs[i]->dlen; + } + } + } + +reset_reqs: + /* + * For the same 'reqs[]' to be usable by + * iaa_comp_acompress()/iaa_comp_deacompress(), + * clear the CRYPTO_ACOMP_REQ_POLL bit on all acomp_reqs, and the + * CRYPTO_TFM_REQ_CHAIN bit on the reqs[0]. + */ + iaa_set_req_poll(reqs, nr_pages, false); + if (!async_poll) + acomp_reqchain_clear(reqs[0], wait); + + return !err; +} + +/** + * This API provides IAA decompress batching functionality for use by swap + * modules. + * + * @reqs: @nr_pages asynchronous decompress requests. + * @wait: crypto_wait for acomp batch decompress implemented using request + * chaining. Required if async_mode is "false". If async_mode is "t= rue", + * and @wait is NULL, the completions will be processed using + * asynchronous polling of the requests' completion statuses. + * @srcs: The src buffers to be decompressed by IAA. + * @pages: The pages to store the decompressed buffers. + * @slens: Compressed lengths of @srcs. + * @errors: zero on successful compression of the corresponding + * req, or error code in case of error. + * @nr_pages: The number of pages, up to IAA_CRYPTO_MAX_BATCH_SIZE, + * to be decompressed. + * + * Returns true if all decompress requests complete successfully, + * false otherwise. + */ +static bool iaa_comp_adecompress_batch( + struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages) +{ + struct scatterlist inputs[IAA_CRYPTO_MAX_BATCH_SIZE]; + struct scatterlist outputs[IAA_CRYPTO_MAX_BATCH_SIZE]; + unsigned int dlens[IAA_CRYPTO_MAX_BATCH_SIZE]; + bool decompressions_done =3D false; + bool async =3D (async_mode && !use_irq); + bool async_poll =3D (async && !wait); + int i, err =3D 0; + + BUG_ON(nr_pages > IAA_CRYPTO_MAX_BATCH_SIZE); + BUG_ON(!async && !wait); + + if (async) + iaa_set_req_poll(reqs, nr_pages, true); + else + iaa_set_req_poll(reqs, nr_pages, false); + + /* + * Prepare and submit acomp_reqs to IAA. IAA will process these + * decompress jobs in parallel if async_mode is true. + */ + for (i =3D 0; i < nr_pages; ++i) { + dlens[i] =3D PAGE_SIZE; + sg_init_one(&inputs[i], srcs[i], slens[i]); + sg_init_table(&outputs[i], 1); + sg_set_page(&outputs[i], pages[i], PAGE_SIZE, 0); + acomp_request_set_params(reqs[i], &inputs[i], + &outputs[i], slens[i], dlens[i]); + + /* + * As long as the API is called with a valid "wait", chain the + * requests for synchronous/asynchronous decompress ops. + * If async_mode is in effect, but the API is called with a + * NULL "wait", submit the requests first, and poll for + * their completion status later, after all descriptors have + * been submitted. + */ + if (!async_poll) { + /* acomp request chaining. */ + if (i) + acomp_request_chain(reqs[i], reqs[0]); + else + acomp_reqchain_init(reqs[0], 0, crypto_req_done, + wait); + } else { + errors[i] =3D iaa_comp_adecompress(reqs[i]); + + if (errors[i] !=3D -EINPROGRESS) { + errors[i] =3D -EINVAL; + err =3D -EINVAL; + } else { + errors[i] =3D -EAGAIN; + } + } + } + + if (!async_poll) { + if (async) + /* Process the request chain in parallel. */ + err =3D crypto_wait_req(acomp_do_async_req_chain(reqs[0], + iaa_comp_adecompress, iaa_comp_poll), + wait); + else + /* Process the request chain in series. */ + err =3D crypto_wait_req(acomp_do_req_chain(reqs[0], + iaa_comp_adecompress), wait); + + for (i =3D 0; i < nr_pages; ++i) { + errors[i] =3D acomp_request_err(reqs[i]); + if (errors[i]) { + err =3D -EINVAL; + pr_debug("Request chaining req %d decompress error %d\n", i, errors[i]= ); + } else { + dlens[i] =3D reqs[i]->dlen; + BUG_ON(dlens[i] !=3D PAGE_SIZE); + } + } + + goto reset_reqs; + } + + /* + * Asynchronously poll for and process IAA decompress job completions. + */ + while (!decompressions_done) { + decompressions_done =3D true; + + for (i =3D 0; i < nr_pages; ++i) { + /* + * Skip, if the decompression has already completed + * successfully or with an error. + */ + if (errors[i] !=3D -EAGAIN) + continue; + + errors[i] =3D iaa_comp_poll(reqs[i]); + + if (errors[i]) { + if (errors[i] =3D=3D -EAGAIN) + decompressions_done =3D false; + else + err =3D -EINVAL; + } else { + dlens[i] =3D reqs[i]->dlen; + BUG_ON(dlens[i] !=3D PAGE_SIZE); + } + } + } + +reset_reqs: + /* + * For the same 'reqs[]' to be usable by + * iaa_comp_acompress()/iaa_comp_deacompress(), + * clear the CRYPTO_ACOMP_REQ_POLL bit on all acomp_reqs, and the + * CRYPTO_TFM_REQ_CHAIN bit on the reqs[0]. + */ + iaa_set_req_poll(reqs, nr_pages, false); + if (!async_poll) + acomp_reqchain_clear(reqs[0], wait); + + return !err; +} + static int iaa_comp_init_fixed(struct crypto_acomp *acomp_tfm) { struct crypto_tfm *tfm =3D crypto_acomp_tfm(acomp_tfm); @@ -1832,10 +2222,13 @@ static struct acomp_alg iaa_acomp_fixed_deflate =3D= { .compress =3D iaa_comp_acompress, .decompress =3D iaa_comp_adecompress, .dst_free =3D dst_free, + .get_batch_size =3D iaa_comp_get_batch_size, + .batch_compress =3D iaa_comp_acompress_batch, + .batch_decompress =3D iaa_comp_adecompress_batch, .base =3D { .cra_name =3D "deflate", .cra_driver_name =3D "deflate-iaa", - .cra_flags =3D CRYPTO_ALG_ASYNC, + .cra_flags =3D CRYPTO_ALG_ASYNC | CRYPTO_ALG_REQ_CHAIN, .cra_ctxsize =3D sizeof(struct iaa_compression_ctx), .cra_module =3D THIS_MODULE, .cra_priority =3D IAA_ALG_PRIORITY, --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C4351EC4EF; Sat, 21 Dec 2024 06:31:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762686; cv=none; b=G4+2MLo/PREyyn2iLD74fPW3VNRvj/G7FqlM5RouKd+dUSzJpwhcRI5rhJXHebpszuOdiftC8ZC6G3n7hJ5ajDK8/HJydfjVj/AYvPOZXZ1B72BlIng9nKH/P18yJddyZN1RDKBer0rv4QWHIZKbrrvcNy5Q1GZ7Rd4EzzlHZe4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762686; c=relaxed/simple; bh=dnKV2MmLzoI9vkZttcUmEHeNbzmt/yFATHDDvqZCCxA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pg+QNiXTMkG6IRENZUYtIAajcRPJ9sTr82mg83RkcMgwFnO8dgpT7WEJjBNKSOLmfA056lCMO9zEerwXn6U607UxyqzZmQ83yCRfEvDaTvi/dDDXujHslfuzxb7TMLoKupYeL5EC+dlzPeAlThev7ehWibudN5yrSYveQayfO7s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kewq4JpJ; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kewq4JpJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762685; x=1766298685; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dnKV2MmLzoI9vkZttcUmEHeNbzmt/yFATHDDvqZCCxA=; b=kewq4JpJM5Z9kM39oFXoZFmYLukN8U+tcppwK1N1k7dcEGhiUHy3sOOz yHyyL5+e2uKXzuxLM6rvMlQQdir9VVIwRHvtq3P1XUiqlQPaHURVPoBc/ /1QXhfCDh72bMwvex36kye/DmMjpg8fYonC7B7+5Mwl4QKEa585IfjXaE CcxGVLqzZSaN7WT9Z2kQ5Gp/cZVk3nvy4LfE7Jx1DOVnBVNEDZJK4LQPM Q0Oi4Pr2llRohPNitbiiA56cqWQJ3Mqh7Lm14LAOh4Fk3yztfYhue+LbC DhjiflV7aJDId9NtM+3+miIVO+O1bIh0jXXiM0tAlnawrAOJb4mOmpws6 Q==; X-CSE-ConnectionGUID: J+PS75KbSnKKZaV8ty8jFg== X-CSE-MsgGUID: YV2l6wD7QiqWSQkNNVeCog== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021663" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021663" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: g4t7vpBRTiCPemeEulVYqg== X-CSE-MsgGUID: R3oSL3RUTZmqIk7E3UgbMA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184588" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:20 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 05/12] crypto: iaa - Make async mode the default. Date: Fri, 20 Dec 2024 22:31:12 -0800 Message-Id: <20241221063119.29140-6-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch makes it easier for IAA hardware acceleration in the iaa_crypto driver to be loaded by default in the most efficient/recommended "async" mode for parallel compressions/decompressions, namely, asynchronous submission of descriptors, followed by polling for job completions with request chaining. Earlier, the "sync" mode used to be the default. This way, anyone who wants to use IAA for zswap/zram can do so after building the kernel, and without having to go through these steps to use async request chaining: 1) disable all the IAA device/wq bindings that happen at boot time 2) rmmod iaa_crypto 3) modprobe iaa_crypto 4) echo async > /sys/bus/dsa/drivers/crypto/sync_mode 5) re-run initialization of the IAA devices and wqs Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index b51b0b4b9ac3..6d49f82165fe 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -153,7 +153,7 @@ static DRIVER_ATTR_RW(verify_compress); */ =20 /* Use async mode */ -static bool async_mode; +static bool async_mode =3D true; /* Use interrupts */ static bool use_irq; =20 --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D7D11EC4F7; Sat, 21 Dec 2024 06:31:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762687; cv=none; b=mnjoR+hS+Dxiw87EcsRty5jjOAccfDqTOYueRYWUj9aTX0xZkRUV8+UkVq2BID+xygWdu3bsNkemUNyZqEl2FelX90VkOa85DWpBkLFIBWsnd1hoSd7qLnmcrrHsDN/yUMC86Ay33HyW5APV9JGNZ8+NMBY8sJ4HQnoNyyzRoVc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762687; c=relaxed/simple; bh=3AC0t67W9YXnQaqtfbv5O8FRwsYURPHnfKQ9TbAZwNM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A7sMDCYNTDRmc1QGzE4PfLLlYmgLgjyLU0Sj26P/fPjMCis++SRQoOV11rFzjJFts/HjHgZZVegkFNGqxNxomHUqYPRreIu0rs9GE/fe9nn7z2fYIdNl4i0rmWBqKwREaytNG37TL1T/RYLecJ26QW4oFsOhuRGLnfHPzCIEynI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nK+Hgfjl; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nK+Hgfjl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762685; x=1766298685; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3AC0t67W9YXnQaqtfbv5O8FRwsYURPHnfKQ9TbAZwNM=; b=nK+HgfjlXQy464lMuSwSn3EA7nFVooi9uAd9HsvNeKyGc0+Tu4RXHAb/ RzK6iwUN5OpFxRsaP4bGJDecJeemzfjWN4pRgxtV9nAMwiRX/0f5hnzAb gboxJ5FHSIys0KMeB/cUGYSoSimSdbrEmL2w1r4C8zEp9iHIFDBs7Cd9j PSrPW1xkGsNl0j/PBZvuVPUR/3DWZ3uQOOv34hxA6FlX6qIlOuXU0+Mg6 76bL0gT1buYWt+sh1pGHaK3XUd31YqAx3f9tHFJf4Xiu64iCBHKSTYAKa OSXGuFfZqOGX/RjG9KO9hH5xL5vgWdOANUcMMzsZf0U5SBBpfaIxnm4kA A==; X-CSE-ConnectionGUID: PPUUucX7RZmwmeDORnbh5A== X-CSE-MsgGUID: 9iLLWbaST/qLOoHyR4biCQ== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021671" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021671" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: z4BLm9q9QQGNlzkgRdLZ0g== X-CSE-MsgGUID: SerZT0HOQPixyY2N5jgjOQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184592" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 06/12] crypto: iaa - Disable iaa_verify_compress by default. Date: Fri, 20 Dec 2024 22:31:13 -0800 Message-Id: <20241221063119.29140-7-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch makes it easier for IAA hardware acceleration in the iaa_crypto driver to be loaded by default with "iaa_verify_compress" disabled, to facilitate performance comparisons with software compressors (which also do not run compress verification by default). Earlier, iaa_crypto compress verification used to be enabled by default. With this patch, if users want to enable compress verification, they can do so with these steps: 1) disable all the IAA device/wq bindings that happen at boot time 2) rmmod iaa_crypto 3) modprobe iaa_crypto 4) echo 1 > /sys/bus/dsa/drivers/crypto/verify_compress 5) re-run initialization of the IAA devices and wqs Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index 6d49f82165fe..f4807a828034 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -94,7 +94,7 @@ static bool iaa_crypto_enabled; static bool iaa_crypto_registered; =20 /* Verify results of IAA compress or not */ -static bool iaa_verify_compress =3D true; +static bool iaa_verify_compress =3D false; =20 static ssize_t verify_compress_show(struct device_driver *driver, char *bu= f) { --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86AE71F03E4; Sat, 21 Dec 2024 06:31:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762689; cv=none; b=Gn8vNNhy2iz6z2AtcRtJsP2C1sDvGnOO6G+iaQ5oZkcQmgm/IIVA/qMlGAOGmYhgjjBv3mJQRd6MicdGWeldzwdTs/aePkZyXxbe8cfOx4RQnF6HAVUSuGGe4O4X8pyYc+dIZ5yYGxvvCVRvIg8W1BU9APlERbDoYfRez6VsKU8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762689; c=relaxed/simple; bh=yBCr+FraBHYq64d9eiZRbYqzgC3sqMOsBxQBn5IGL/8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GvS6b+9CtIQyeM4NEM2cLPQ5ZeSflC1bvQ3g2VG64SwJYziGQFIPNz7oufAlKzxdW9CWEWTYr7z/ybXw9hXrtper63XQTce989dRGuTHFiGoZ+bzYJXSPUOFX80d8tO4bTBETvrotNo/QkEX30ztGIgDUBbOf0xm7E3XgwUd8CY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TbdPpMMd; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TbdPpMMd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762686; x=1766298686; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yBCr+FraBHYq64d9eiZRbYqzgC3sqMOsBxQBn5IGL/8=; b=TbdPpMMdBCWAkA14PNZh1shlfdt6wln7SkdcoGDZ7M85wzU3tzrPUlXd unKWUKcM7VsCKoTnyx+zi1BvMyi2mh2xRCRpfe/yOqTVDNTJY70GVL+0E grf7Bw1QiytBAF8tZsYoei8FnTh8LY2371Oiu2gM642G2ZQm49ibmDs/x eXhX6OAv9XllJOWs6iNt8qXAjFuTDlBB8YudDBTQB3xlhkoDAn+X0gLPB q+VbJRYiEltvf+ZsCXWj/i7KsK0l/ki/WWP3BFPTbUDwH8nsg4rgf+5Mi NzCxC8X4cDjFG4gUv3vhycYjPxqdoP5l+1p8ADC3YzFOsv/cJNi39VOXe g==; X-CSE-ConnectionGUID: FyTxbK2wRkOJlc/l6MwZgA== X-CSE-MsgGUID: WdBN8aylS7m1wNS6HZR9Hg== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021683" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021683" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: FqmhOvYtQ9qDZSnG5fpQfA== X-CSE-MsgGUID: sW4j5HJ9Thqy3jUBe/uPvw== X-Ironport-Invalid-End-Of-Message: True X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184596" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 07/12] crypto: iaa - Re-organize the iaa_crypto driver code. Date: Fri, 20 Dec 2024 22:31:14 -0800 Message-Id: <20241221063119.29140-8-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch merely reorganizes the code in iaa_crypto_main.c, so that the functions are consolidated into logically related sub-sections of code. This is expected to make the code more maintainable and for it to be easier to replace functional layers and/or add new features. Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto_main.c | 540 +++++++++++---------- 1 file changed, 275 insertions(+), 265 deletions(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index f4807a828034..2c5b7ce041d6 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -24,6 +24,9 @@ =20 #define IAA_ALG_PRIORITY 300 =20 +/************************************** + * Driver internal global variables. + **************************************/ /* number of iaa instances probed */ static unsigned int nr_iaa; static unsigned int nr_cpus; @@ -36,55 +39,46 @@ static unsigned int cpus_per_iaa; static struct crypto_comp *deflate_generic_tfm; =20 /* Per-cpu lookup table for balanced wqs */ -static struct wq_table_entry __percpu *wq_table; +static struct wq_table_entry __percpu *wq_table =3D NULL; =20 -static struct idxd_wq *wq_table_next_wq(int cpu) -{ - struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); - - if (++entry->cur_wq >=3D entry->n_wqs) - entry->cur_wq =3D 0; - - if (!entry->wqs[entry->cur_wq]) - return NULL; - - pr_debug("%s: returning wq at idx %d (iaa wq %d.%d) from cpu %d\n", __fun= c__, - entry->cur_wq, entry->wqs[entry->cur_wq]->idxd->id, - entry->wqs[entry->cur_wq]->id, cpu); - - return entry->wqs[entry->cur_wq]; -} - -static void wq_table_add(int cpu, struct idxd_wq *wq) -{ - struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); - - if (WARN_ON(entry->n_wqs =3D=3D entry->max_wqs)) - return; - - entry->wqs[entry->n_wqs++] =3D wq; - - pr_debug("%s: added iaa wq %d.%d to idx %d of cpu %d\n", __func__, - entry->wqs[entry->n_wqs - 1]->idxd->id, - entry->wqs[entry->n_wqs - 1]->id, entry->n_wqs - 1, cpu); -} - -static void wq_table_free_entry(int cpu) -{ - struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); +/* Verify results of IAA compress or not */ +static bool iaa_verify_compress =3D false; =20 - kfree(entry->wqs); - memset(entry, 0, sizeof(*entry)); -} +/* + * The iaa crypto driver supports three 'sync' methods determining how + * compressions and decompressions are performed: + * + * - sync: the compression or decompression completes before + * returning. This is the mode used by the async crypto + * interface when the sync mode is set to 'sync' and by + * the sync crypto interface regardless of setting. + * + * - async: the compression or decompression is submitted and returns + * immediately. Completion interrupts are not used so + * the caller is responsible for polling the descriptor + * for completion. This mode is applicable to only the + * async crypto interface and is ignored for anything + * else. + * + * - async_irq: the compression or decompression is submitted and + * returns immediately. Completion interrupts are + * enabled so the caller can wait for the completion and + * yield to other threads. When the compression or + * decompression completes, the completion is signaled + * and the caller awakened. This mode is applicable to + * only the async crypto interface and is ignored for + * anything else. + * + * These modes can be set using the iaa_crypto sync_mode driver + * attribute. + */ =20 -static void wq_table_clear_entry(int cpu) -{ - struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); +/* Use async mode */ +static bool async_mode =3D true; +/* Use interrupts */ +static bool use_irq; =20 - entry->n_wqs =3D 0; - entry->cur_wq =3D 0; - memset(entry->wqs, 0, entry->max_wqs * sizeof(struct idxd_wq *)); -} +static struct iaa_compression_mode *iaa_compression_modes[IAA_COMP_MODES_M= AX]; =20 LIST_HEAD(iaa_devices); DEFINE_MUTEX(iaa_devices_lock); @@ -93,9 +87,9 @@ DEFINE_MUTEX(iaa_devices_lock); static bool iaa_crypto_enabled; static bool iaa_crypto_registered; =20 -/* Verify results of IAA compress or not */ -static bool iaa_verify_compress =3D false; - +/************************************************** + * Driver attributes along with get/set functions. + **************************************************/ static ssize_t verify_compress_show(struct device_driver *driver, char *bu= f) { return sprintf(buf, "%d\n", iaa_verify_compress); @@ -123,40 +117,6 @@ static ssize_t verify_compress_store(struct device_dri= ver *driver, } static DRIVER_ATTR_RW(verify_compress); =20 -/* - * The iaa crypto driver supports three 'sync' methods determining how - * compressions and decompressions are performed: - * - * - sync: the compression or decompression completes before - * returning. This is the mode used by the async crypto - * interface when the sync mode is set to 'sync' and by - * the sync crypto interface regardless of setting. - * - * - async: the compression or decompression is submitted and returns - * immediately. Completion interrupts are not used so - * the caller is responsible for polling the descriptor - * for completion. This mode is applicable to only the - * async crypto interface and is ignored for anything - * else. - * - * - async_irq: the compression or decompression is submitted and - * returns immediately. Completion interrupts are - * enabled so the caller can wait for the completion and - * yield to other threads. When the compression or - * decompression completes, the completion is signaled - * and the caller awakened. This mode is applicable to - * only the async crypto interface and is ignored for - * anything else. - * - * These modes can be set using the iaa_crypto sync_mode driver - * attribute. - */ - -/* Use async mode */ -static bool async_mode =3D true; -/* Use interrupts */ -static bool use_irq; - /** * set_iaa_sync_mode - Set IAA sync mode * @name: The name of the sync mode @@ -219,8 +179,9 @@ static ssize_t sync_mode_store(struct device_driver *dr= iver, } static DRIVER_ATTR_RW(sync_mode); =20 -static struct iaa_compression_mode *iaa_compression_modes[IAA_COMP_MODES_M= AX]; - +/**************************** + * Driver compression modes. + ****************************/ static int find_empty_iaa_compression_mode(void) { int i =3D -EINVAL; @@ -411,11 +372,6 @@ static void free_device_compression_mode(struct iaa_de= vice *iaa_device, IDXD_OP_FLAG_WR_SRC2_AECS_COMP | \ IDXD_OP_FLAG_AECS_RW_TGLS) =20 -static int check_completion(struct device *dev, - struct iax_completion_record *comp, - bool compress, - bool only_once); - static int init_device_compression_mode(struct iaa_device *iaa_device, struct iaa_compression_mode *mode, int idx, struct idxd_wq *wq) @@ -502,6 +458,10 @@ static void remove_device_compression_modes(struct iaa= _device *iaa_device) } } =20 +/*********************************************************** + * Functions for use in crypto probe and remove interfaces: + * allocate/init/query/deallocate devices/wqs. + ***********************************************************/ static struct iaa_device *iaa_device_alloc(void) { struct iaa_device *iaa_device; @@ -614,16 +574,6 @@ static void del_iaa_wq(struct iaa_device *iaa_device, = struct idxd_wq *wq) } } =20 -static void clear_wq_table(void) -{ - int cpu; - - for (cpu =3D 0; cpu < nr_cpus; cpu++) - wq_table_clear_entry(cpu); - - pr_debug("cleared wq table\n"); -} - static void free_iaa_device(struct iaa_device *iaa_device) { if (!iaa_device) @@ -704,43 +654,6 @@ static int iaa_wq_put(struct idxd_wq *wq) return ret; } =20 -static void free_wq_table(void) -{ - int cpu; - - for (cpu =3D 0; cpu < nr_cpus; cpu++) - wq_table_free_entry(cpu); - - free_percpu(wq_table); - - pr_debug("freed wq table\n"); -} - -static int alloc_wq_table(int max_wqs) -{ - struct wq_table_entry *entry; - int cpu; - - wq_table =3D alloc_percpu(struct wq_table_entry); - if (!wq_table) - return -ENOMEM; - - for (cpu =3D 0; cpu < nr_cpus; cpu++) { - entry =3D per_cpu_ptr(wq_table, cpu); - entry->wqs =3D kcalloc(max_wqs, sizeof(struct wq *), GFP_KERNEL); - if (!entry->wqs) { - free_wq_table(); - return -ENOMEM; - } - - entry->max_wqs =3D max_wqs; - } - - pr_debug("initialized wq table\n"); - - return 0; -} - static int save_iaa_wq(struct idxd_wq *wq) { struct iaa_device *iaa_device, *found =3D NULL; @@ -829,6 +742,87 @@ static void remove_iaa_wq(struct idxd_wq *wq) cpus_per_iaa =3D 1; } =20 +/*************************************************************** + * Mapping IAA devices and wqs to cores with per-cpu wq_tables. + ***************************************************************/ +static void wq_table_free_entry(int cpu) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); + + kfree(entry->wqs); + memset(entry, 0, sizeof(*entry)); +} + +static void wq_table_clear_entry(int cpu) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); + + entry->n_wqs =3D 0; + entry->cur_wq =3D 0; + memset(entry->wqs, 0, entry->max_wqs * sizeof(struct idxd_wq *)); +} + +static void clear_wq_table(void) +{ + int cpu; + + for (cpu =3D 0; cpu < nr_cpus; cpu++) + wq_table_clear_entry(cpu); + + pr_debug("cleared wq table\n"); +} + +static void free_wq_table(void) +{ + int cpu; + + for (cpu =3D 0; cpu < nr_cpus; cpu++) + wq_table_free_entry(cpu); + + free_percpu(wq_table); + + pr_debug("freed wq table\n"); +} + +static int alloc_wq_table(int max_wqs) +{ + struct wq_table_entry *entry; + int cpu; + + wq_table =3D alloc_percpu(struct wq_table_entry); + if (!wq_table) + return -ENOMEM; + + for (cpu =3D 0; cpu < nr_cpus; cpu++) { + entry =3D per_cpu_ptr(wq_table, cpu); + entry->wqs =3D kcalloc(max_wqs, sizeof(struct wq *), GFP_KERNEL); + if (!entry->wqs) { + free_wq_table(); + return -ENOMEM; + } + + entry->max_wqs =3D max_wqs; + } + + pr_debug("initialized wq table\n"); + + return 0; +} + +static void wq_table_add(int cpu, struct idxd_wq *wq) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); + + if (WARN_ON(entry->n_wqs =3D=3D entry->max_wqs)) + return; + + entry->wqs[entry->n_wqs++] =3D wq; + + pr_debug("%s: added iaa wq %d.%d to idx %d of cpu %d\n", __func__, + entry->wqs[entry->n_wqs - 1]->idxd->id, + entry->wqs[entry->n_wqs - 1]->id, entry->n_wqs - 1, cpu); +} + static int wq_table_add_wqs(int iaa, int cpu) { struct iaa_device *iaa_device, *found_device =3D NULL; @@ -939,6 +933,29 @@ static void rebalance_wq_table(void) } } =20 +/*************************************************************** + * Assign work-queues for driver ops using per-cpu wq_tables. + ***************************************************************/ +static struct idxd_wq *wq_table_next_wq(int cpu) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); + + if (++entry->cur_wq >=3D entry->n_wqs) + entry->cur_wq =3D 0; + + if (!entry->wqs[entry->cur_wq]) + return NULL; + + pr_debug("%s: returning wq at idx %d (iaa wq %d.%d) from cpu %d\n", __fun= c__, + entry->cur_wq, entry->wqs[entry->cur_wq]->idxd->id, + entry->wqs[entry->cur_wq]->id, cpu); + + return entry->wqs[entry->cur_wq]; +} + +/************************************************* + * Core iaa_crypto compress/decompress functions. + *************************************************/ static inline int check_completion(struct device *dev, struct iax_completion_record *comp, bool compress, @@ -1020,13 +1037,130 @@ static int deflate_generic_decompress(struct acomp= _req *req) =20 static int iaa_remap_for_verify(struct device *dev, struct iaa_wq *iaa_wq, struct acomp_req *req, - dma_addr_t *src_addr, dma_addr_t *dst_addr); + dma_addr_t *src_addr, dma_addr_t *dst_addr) +{ + int ret =3D 0; + int nr_sgs; + + dma_unmap_sg(dev, req->dst, sg_nents(req->dst), DMA_FROM_DEVICE); + dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_TO_DEVICE); + + nr_sgs =3D dma_map_sg(dev, req->src, sg_nents(req->src), DMA_FROM_DEVICE); + if (nr_sgs <=3D 0 || nr_sgs > 1) { + dev_dbg(dev, "verify: couldn't map src sg for iaa device %d," + " wq %d: ret=3D%d\n", iaa_wq->iaa_device->idxd->id, + iaa_wq->wq->id, ret); + ret =3D -EIO; + goto out; + } + *src_addr =3D sg_dma_address(req->src); + dev_dbg(dev, "verify: dma_map_sg, src_addr %llx, nr_sgs %d, req->src %p," + " req->slen %d, sg_dma_len(sg) %d\n", *src_addr, nr_sgs, + req->src, req->slen, sg_dma_len(req->src)); + + nr_sgs =3D dma_map_sg(dev, req->dst, sg_nents(req->dst), DMA_TO_DEVICE); + if (nr_sgs <=3D 0 || nr_sgs > 1) { + dev_dbg(dev, "verify: couldn't map dst sg for iaa device %d," + " wq %d: ret=3D%d\n", iaa_wq->iaa_device->idxd->id, + iaa_wq->wq->id, ret); + ret =3D -EIO; + dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_FROM_DEVICE); + goto out; + } + *dst_addr =3D sg_dma_address(req->dst); + dev_dbg(dev, "verify: dma_map_sg, dst_addr %llx, nr_sgs %d, req->dst %p," + " req->dlen %d, sg_dma_len(sg) %d\n", *dst_addr, nr_sgs, + req->dst, req->dlen, sg_dma_len(req->dst)); +out: + return ret; +} =20 static int iaa_compress_verify(struct crypto_tfm *tfm, struct acomp_req *r= eq, struct idxd_wq *wq, dma_addr_t src_addr, unsigned int slen, dma_addr_t dst_addr, unsigned int *dlen, - u32 compression_crc); + u32 compression_crc) +{ + struct iaa_device_compression_mode *active_compression_mode; + struct iaa_compression_ctx *ctx =3D crypto_tfm_ctx(tfm); + struct iaa_device *iaa_device; + struct idxd_desc *idxd_desc; + struct iax_hw_desc *desc; + struct idxd_device *idxd; + struct iaa_wq *iaa_wq; + struct pci_dev *pdev; + struct device *dev; + int ret =3D 0; + + iaa_wq =3D idxd_wq_get_private(wq); + iaa_device =3D iaa_wq->iaa_device; + idxd =3D iaa_device->idxd; + pdev =3D idxd->pdev; + dev =3D &pdev->dev; + + active_compression_mode =3D get_iaa_device_compression_mode(iaa_device, c= tx->mode); + + idxd_desc =3D idxd_alloc_desc(wq, IDXD_OP_BLOCK); + if (IS_ERR(idxd_desc)) { + dev_dbg(dev, "idxd descriptor allocation failed\n"); + dev_dbg(dev, "iaa compress failed: ret=3D%ld\n", + PTR_ERR(idxd_desc)); + return PTR_ERR(idxd_desc); + } + desc =3D idxd_desc->iax_hw; + + /* Verify (optional) - decompress and check crc, suppress dest write */ + + desc->flags =3D IDXD_OP_FLAG_CRAV | IDXD_OP_FLAG_RCR | IDXD_OP_FLAG_CC; + desc->opcode =3D IAX_OPCODE_DECOMPRESS; + desc->decompr_flags =3D IAA_DECOMP_FLAGS | IAA_DECOMP_SUPPRESS_OUTPUT; + desc->priv =3D 0; + + desc->src1_addr =3D (u64)dst_addr; + desc->src1_size =3D *dlen; + desc->dst_addr =3D (u64)src_addr; + desc->max_dst_size =3D slen; + desc->completion_addr =3D idxd_desc->compl_dma; + + dev_dbg(dev, "(verify) compression mode %s," + " desc->src1_addr %llx, desc->src1_size %d," + " desc->dst_addr %llx, desc->max_dst_size %d," + " desc->src2_addr %llx, desc->src2_size %d\n", + active_compression_mode->name, + desc->src1_addr, desc->src1_size, desc->dst_addr, + desc->max_dst_size, desc->src2_addr, desc->src2_size); + + ret =3D idxd_submit_desc(wq, idxd_desc); + if (ret) { + dev_dbg(dev, "submit_desc (verify) failed ret=3D%d\n", ret); + goto err; + } + + ret =3D check_completion(dev, idxd_desc->iax_completion, false, false); + if (ret) { + dev_dbg(dev, "(verify) check_completion failed ret=3D%d\n", ret); + goto err; + } + + if (compression_crc !=3D idxd_desc->iax_completion->crc) { + ret =3D -EINVAL; + dev_dbg(dev, "(verify) iaa comp/decomp crc mismatch:" + " comp=3D0x%x, decomp=3D0x%x\n", compression_crc, + idxd_desc->iax_completion->crc); + print_hex_dump(KERN_INFO, "cmp-rec: ", DUMP_PREFIX_OFFSET, + 8, 1, idxd_desc->iax_completion, 64, 0); + goto err; + } + + idxd_free_desc(wq, idxd_desc); +out: + return ret; +err: + idxd_free_desc(wq, idxd_desc); + dev_dbg(dev, "iaa compress failed: ret=3D%d\n", ret); + + goto out; +} =20 static void iaa_desc_complete(struct idxd_desc *idxd_desc, enum idxd_complete_type comp_type, @@ -1245,133 +1379,6 @@ static int iaa_compress(struct crypto_tfm *tfm, str= uct acomp_req *req, goto out; } =20 -static int iaa_remap_for_verify(struct device *dev, struct iaa_wq *iaa_wq, - struct acomp_req *req, - dma_addr_t *src_addr, dma_addr_t *dst_addr) -{ - int ret =3D 0; - int nr_sgs; - - dma_unmap_sg(dev, req->dst, sg_nents(req->dst), DMA_FROM_DEVICE); - dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_TO_DEVICE); - - nr_sgs =3D dma_map_sg(dev, req->src, sg_nents(req->src), DMA_FROM_DEVICE); - if (nr_sgs <=3D 0 || nr_sgs > 1) { - dev_dbg(dev, "verify: couldn't map src sg for iaa device %d," - " wq %d: ret=3D%d\n", iaa_wq->iaa_device->idxd->id, - iaa_wq->wq->id, ret); - ret =3D -EIO; - goto out; - } - *src_addr =3D sg_dma_address(req->src); - dev_dbg(dev, "verify: dma_map_sg, src_addr %llx, nr_sgs %d, req->src %p," - " req->slen %d, sg_dma_len(sg) %d\n", *src_addr, nr_sgs, - req->src, req->slen, sg_dma_len(req->src)); - - nr_sgs =3D dma_map_sg(dev, req->dst, sg_nents(req->dst), DMA_TO_DEVICE); - if (nr_sgs <=3D 0 || nr_sgs > 1) { - dev_dbg(dev, "verify: couldn't map dst sg for iaa device %d," - " wq %d: ret=3D%d\n", iaa_wq->iaa_device->idxd->id, - iaa_wq->wq->id, ret); - ret =3D -EIO; - dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_FROM_DEVICE); - goto out; - } - *dst_addr =3D sg_dma_address(req->dst); - dev_dbg(dev, "verify: dma_map_sg, dst_addr %llx, nr_sgs %d, req->dst %p," - " req->dlen %d, sg_dma_len(sg) %d\n", *dst_addr, nr_sgs, - req->dst, req->dlen, sg_dma_len(req->dst)); -out: - return ret; -} - -static int iaa_compress_verify(struct crypto_tfm *tfm, struct acomp_req *r= eq, - struct idxd_wq *wq, - dma_addr_t src_addr, unsigned int slen, - dma_addr_t dst_addr, unsigned int *dlen, - u32 compression_crc) -{ - struct iaa_device_compression_mode *active_compression_mode; - struct iaa_compression_ctx *ctx =3D crypto_tfm_ctx(tfm); - struct iaa_device *iaa_device; - struct idxd_desc *idxd_desc; - struct iax_hw_desc *desc; - struct idxd_device *idxd; - struct iaa_wq *iaa_wq; - struct pci_dev *pdev; - struct device *dev; - int ret =3D 0; - - iaa_wq =3D idxd_wq_get_private(wq); - iaa_device =3D iaa_wq->iaa_device; - idxd =3D iaa_device->idxd; - pdev =3D idxd->pdev; - dev =3D &pdev->dev; - - active_compression_mode =3D get_iaa_device_compression_mode(iaa_device, c= tx->mode); - - idxd_desc =3D idxd_alloc_desc(wq, IDXD_OP_BLOCK); - if (IS_ERR(idxd_desc)) { - dev_dbg(dev, "idxd descriptor allocation failed\n"); - dev_dbg(dev, "iaa compress failed: ret=3D%ld\n", - PTR_ERR(idxd_desc)); - return PTR_ERR(idxd_desc); - } - desc =3D idxd_desc->iax_hw; - - /* Verify (optional) - decompress and check crc, suppress dest write */ - - desc->flags =3D IDXD_OP_FLAG_CRAV | IDXD_OP_FLAG_RCR | IDXD_OP_FLAG_CC; - desc->opcode =3D IAX_OPCODE_DECOMPRESS; - desc->decompr_flags =3D IAA_DECOMP_FLAGS | IAA_DECOMP_SUPPRESS_OUTPUT; - desc->priv =3D 0; - - desc->src1_addr =3D (u64)dst_addr; - desc->src1_size =3D *dlen; - desc->dst_addr =3D (u64)src_addr; - desc->max_dst_size =3D slen; - desc->completion_addr =3D idxd_desc->compl_dma; - - dev_dbg(dev, "(verify) compression mode %s," - " desc->src1_addr %llx, desc->src1_size %d," - " desc->dst_addr %llx, desc->max_dst_size %d," - " desc->src2_addr %llx, desc->src2_size %d\n", - active_compression_mode->name, - desc->src1_addr, desc->src1_size, desc->dst_addr, - desc->max_dst_size, desc->src2_addr, desc->src2_size); - - ret =3D idxd_submit_desc(wq, idxd_desc); - if (ret) { - dev_dbg(dev, "submit_desc (verify) failed ret=3D%d\n", ret); - goto err; - } - - ret =3D check_completion(dev, idxd_desc->iax_completion, false, false); - if (ret) { - dev_dbg(dev, "(verify) check_completion failed ret=3D%d\n", ret); - goto err; - } - - if (compression_crc !=3D idxd_desc->iax_completion->crc) { - ret =3D -EINVAL; - dev_dbg(dev, "(verify) iaa comp/decomp crc mismatch:" - " comp=3D0x%x, decomp=3D0x%x\n", compression_crc, - idxd_desc->iax_completion->crc); - print_hex_dump(KERN_INFO, "cmp-rec: ", DUMP_PREFIX_OFFSET, - 8, 1, idxd_desc->iax_completion, 64, 0); - goto err; - } - - idxd_free_desc(wq, idxd_desc); -out: - return ret; -err: - idxd_free_desc(wq, idxd_desc); - dev_dbg(dev, "iaa compress failed: ret=3D%d\n", ret); - - goto out; -} - static int iaa_decompress(struct crypto_tfm *tfm, struct acomp_req *req, struct idxd_wq *wq, dma_addr_t src_addr, unsigned int slen, @@ -2197,6 +2204,9 @@ static bool iaa_comp_adecompress_batch( return !err; } =20 +/********************************************* + * Interfaces to crypto_alg and crypto_acomp. + *********************************************/ static int iaa_comp_init_fixed(struct crypto_acomp *acomp_tfm) { struct crypto_tfm *tfm =3D crypto_acomp_tfm(acomp_tfm); --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8D901F0E32; Sat, 21 Dec 2024 06:31:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762689; cv=none; b=Cc7fqe5y0GGJOulIPZnbdIaWJLfYe0atS2XQpQEvVwuktsB5liwmdXy11m8E9YZSbhz1EQBngc2XgkIRuphS8rKtC+9lIWtJWQIm6CoPS+TEi6fiF8iHqrTv1b20Ti9MOIMn/k2lAs2dsBG2gGiNIHfebVWdqgWWikD0fIjopzg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762689; c=relaxed/simple; bh=XgmFrlxIUAmI4N9Gxiy77mwDBx+ourmvlIMPZ8/24cI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Zu/WUsziFwICuv01+76byzR8JzjIXbGuzTxwNaKfsilmq1MOxGEts1iFBKnHvQpwfUoqZeeW9iveW/x+40B0AZ/XaGUJRbucMEK+Lp5pqrztH/goiMtsJqNGDOKMwIkeyVgXWVorCIW4kOX+Wy52idgr3FNaWlnENR3h+KqOck4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IkQU/TS2; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IkQU/TS2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762687; x=1766298687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XgmFrlxIUAmI4N9Gxiy77mwDBx+ourmvlIMPZ8/24cI=; b=IkQU/TS2C6s6yb0TYTtuKmB+ap8w7l7hw9J+KQzPRRkmz5EmTtp2nn4N nYNUFbysBQXK4B4UwMovtz8RzwD/5jGj2SGQ6MmY+LXy2th4JBxQDu+Hf XM9pVshnqk4jX+YRk4HbKOPypAk4f9Ue0bnwzGBhWZ6pLvdEFDDBKztFN tJXY/2ZIuZO48SMusHOXuKNlm9TBFV6zFWC5+D+uSEvQvG8T0FlcG23tk j8zPYuQGggZzEPADP5XF3be08yF/mX9ThNdmi/hSllu4uBkhBZ5ASvvm6 U3Kfmqwly6L7eDhMoEBXDwLfV0f7tXaLFVLejk8Vfks4rJuKGEh2sA+86 g==; X-CSE-ConnectionGUID: N6cXxq3ySo+YV4QP+UBdUA== X-CSE-MsgGUID: JFQOo/AYQPysBxE9BRseUQ== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021695" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021695" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: uu3W/c/VRFSKujjZFAFTMw== X-CSE-MsgGUID: 3uxd9AbxQQyNCu5K42uBUg== X-Ironport-Invalid-End-Of-Message: True X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184600" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 08/12] crypto: iaa - Map IAA devices/wqs to cores based on packages instead of NUMA. Date: Fri, 20 Dec 2024 22:31:15 -0800 Message-Id: <20241221063119.29140-9-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch modifies the algorithm for mapping available IAA devices and wqs to cores, as they are being discovered, based on packages instead of NUMA nodes. This leads to a more realistic mapping of IAA devices as compression/decompression resources for a package, rather than for a NUMA node. This also resolves problems that were observed during internal validation on Intel platforms with many more NUMA nodes than packages: for such cases, the earlier NUMA based allocation caused some IAAs to be over-subscribed and some to not be utilized at all. As a result of this change from NUMA to packages, some of the core functions used by the iaa_crypto driver's "probe" and "remove" API have been re-written. The new infrastructure maintains a static/global mapping of "local wqs" per IAA device, in the "struct iaa_device" itself. The earlier implementation would allocate memory per-cpu for this data, which never changes once the IAA devices/wqs have been initialized. Two main outcomes from this new iaa_crypto driver infrastructure are: 1) Resolves "task blocked for more than x seconds" errors observed during internal validation on Intel systems with the earlier NUMA node based mappings, which was root-caused to the non-optimal IAA-to-core mappings described earlier. 2) Results in a NUM_THREADS factor reduction in memory footprint cost of initializing IAA devices/wqs, due to eliminating the per-cpu copies of each IAA device's wqs. On a 384 cores Intel Granite Rapids server with 8 IAA devices, this saves 140MiB. Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto.h | 17 +- drivers/crypto/intel/iaa/iaa_crypto_main.c | 276 ++++++++++++--------- 2 files changed, 171 insertions(+), 122 deletions(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/i= aa/iaa_crypto.h index b3b67c44ec8a..74d25e62df12 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -55,6 +55,7 @@ struct iaa_wq { struct idxd_wq *wq; int ref; bool remove; + bool mapped; =20 struct iaa_device *iaa_device; =20 @@ -72,6 +73,13 @@ struct iaa_device_compression_mode { dma_addr_t aecs_comp_table_dma_addr; }; =20 +struct wq_table_entry { + struct idxd_wq **wqs; + int max_wqs; + int n_wqs; + int cur_wq; +}; + /* Representation of IAA device with wqs, populated by probe */ struct iaa_device { struct list_head list; @@ -82,19 +90,14 @@ struct iaa_device { int n_wq; struct list_head wqs; =20 + struct wq_table_entry *iaa_local_wqs; + atomic64_t comp_calls; atomic64_t comp_bytes; atomic64_t decomp_calls; atomic64_t decomp_bytes; }; =20 -struct wq_table_entry { - struct idxd_wq **wqs; - int max_wqs; - int n_wqs; - int cur_wq; -}; - #define IAA_AECS_ALIGN 32 =20 /* diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index 2c5b7ce041d6..418f78454875 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -30,8 +30,9 @@ /* number of iaa instances probed */ static unsigned int nr_iaa; static unsigned int nr_cpus; -static unsigned int nr_nodes; -static unsigned int nr_cpus_per_node; +static unsigned int nr_packages; +static unsigned int nr_cpus_per_package; +static unsigned int nr_iaa_per_package; =20 /* Number of physical cpus sharing each iaa instance */ static unsigned int cpus_per_iaa; @@ -462,17 +463,46 @@ static void remove_device_compression_modes(struct ia= a_device *iaa_device) * Functions for use in crypto probe and remove interfaces: * allocate/init/query/deallocate devices/wqs. ***********************************************************/ -static struct iaa_device *iaa_device_alloc(void) +static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd) { + struct wq_table_entry *local; struct iaa_device *iaa_device; =20 iaa_device =3D kzalloc(sizeof(*iaa_device), GFP_KERNEL); if (!iaa_device) - return NULL; + goto err; + + iaa_device->idxd =3D idxd; + + /* IAA device's local wqs. */ + iaa_device->iaa_local_wqs =3D kzalloc(sizeof(struct wq_table_entry), GFP_= KERNEL); + if (!iaa_device->iaa_local_wqs) + goto err; + + local =3D iaa_device->iaa_local_wqs; + + local->wqs =3D kzalloc(iaa_device->idxd->max_wqs * sizeof(struct wq *), G= FP_KERNEL); + if (!local->wqs) + goto err; + + local->max_wqs =3D iaa_device->idxd->max_wqs; + local->n_wqs =3D 0; =20 INIT_LIST_HEAD(&iaa_device->wqs); =20 return iaa_device; + +err: + if (iaa_device) { + if (iaa_device->iaa_local_wqs) { + if (iaa_device->iaa_local_wqs->wqs) + kfree(iaa_device->iaa_local_wqs->wqs); + kfree(iaa_device->iaa_local_wqs); + } + kfree(iaa_device); + } + + return NULL; } =20 static bool iaa_has_wq(struct iaa_device *iaa_device, struct idxd_wq *wq) @@ -491,12 +521,10 @@ static struct iaa_device *add_iaa_device(struct idxd_= device *idxd) { struct iaa_device *iaa_device; =20 - iaa_device =3D iaa_device_alloc(); + iaa_device =3D iaa_device_alloc(idxd); if (!iaa_device) return NULL; =20 - iaa_device->idxd =3D idxd; - list_add_tail(&iaa_device->list, &iaa_devices); =20 nr_iaa++; @@ -537,6 +565,7 @@ static int add_iaa_wq(struct iaa_device *iaa_device, st= ruct idxd_wq *wq, iaa_wq->wq =3D wq; iaa_wq->iaa_device =3D iaa_device; idxd_wq_set_private(wq, iaa_wq); + iaa_wq->mapped =3D false; =20 list_add_tail(&iaa_wq->list, &iaa_device->wqs); =20 @@ -580,6 +609,13 @@ static void free_iaa_device(struct iaa_device *iaa_dev= ice) return; =20 remove_device_compression_modes(iaa_device); + + if (iaa_device->iaa_local_wqs) { + if (iaa_device->iaa_local_wqs->wqs) + kfree(iaa_device->iaa_local_wqs->wqs); + kfree(iaa_device->iaa_local_wqs); + } + kfree(iaa_device); } =20 @@ -716,9 +752,14 @@ static int save_iaa_wq(struct idxd_wq *wq) if (WARN_ON(nr_iaa =3D=3D 0)) return -EINVAL; =20 - cpus_per_iaa =3D (nr_nodes * nr_cpus_per_node) / nr_iaa; + cpus_per_iaa =3D (nr_packages * nr_cpus_per_package) / nr_iaa; if (!cpus_per_iaa) cpus_per_iaa =3D 1; + + nr_iaa_per_package =3D nr_iaa / nr_packages; + if (!nr_iaa_per_package) + nr_iaa_per_package =3D 1; + out: return 0; } @@ -735,53 +776,45 @@ static void remove_iaa_wq(struct idxd_wq *wq) } =20 if (nr_iaa) { - cpus_per_iaa =3D (nr_nodes * nr_cpus_per_node) / nr_iaa; + cpus_per_iaa =3D (nr_packages * nr_cpus_per_package) / nr_iaa; if (!cpus_per_iaa) cpus_per_iaa =3D 1; - } else + + nr_iaa_per_package =3D nr_iaa / nr_packages; + if (!nr_iaa_per_package) + nr_iaa_per_package =3D 1; + } else { cpus_per_iaa =3D 1; + nr_iaa_per_package =3D 1; + } } =20 /*************************************************************** * Mapping IAA devices and wqs to cores with per-cpu wq_tables. ***************************************************************/ -static void wq_table_free_entry(int cpu) -{ - struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); - - kfree(entry->wqs); - memset(entry, 0, sizeof(*entry)); -} - -static void wq_table_clear_entry(int cpu) -{ - struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); - - entry->n_wqs =3D 0; - entry->cur_wq =3D 0; - memset(entry->wqs, 0, entry->max_wqs * sizeof(struct idxd_wq *)); -} - -static void clear_wq_table(void) +/* + * Given a cpu, find the closest IAA instance. The idea is to try to + * choose the most appropriate IAA instance for a caller and spread + * available workqueues around to clients. + */ +static inline int cpu_to_iaa(int cpu) { - int cpu; - - for (cpu =3D 0; cpu < nr_cpus; cpu++) - wq_table_clear_entry(cpu); + int package_id, base_iaa, iaa =3D 0; =20 - pr_debug("cleared wq table\n"); -} + if (!nr_packages || !nr_iaa_per_package) + return 0; =20 -static void free_wq_table(void) -{ - int cpu; + package_id =3D topology_logical_package_id(cpu); + base_iaa =3D package_id * nr_iaa_per_package; + iaa =3D base_iaa + ((cpu % nr_cpus_per_package) / cpus_per_iaa); =20 - for (cpu =3D 0; cpu < nr_cpus; cpu++) - wq_table_free_entry(cpu); + pr_debug("cpu =3D %d, package_id =3D %d, base_iaa =3D %d, iaa =3D %d", + cpu, package_id, base_iaa, iaa); =20 - free_percpu(wq_table); + if (iaa >=3D 0 && iaa < nr_iaa) + return iaa; =20 - pr_debug("freed wq table\n"); + return (nr_iaa - 1); } =20 static int alloc_wq_table(int max_wqs) @@ -795,13 +828,11 @@ static int alloc_wq_table(int max_wqs) =20 for (cpu =3D 0; cpu < nr_cpus; cpu++) { entry =3D per_cpu_ptr(wq_table, cpu); - entry->wqs =3D kcalloc(max_wqs, sizeof(struct wq *), GFP_KERNEL); - if (!entry->wqs) { - free_wq_table(); - return -ENOMEM; - } =20 + entry->wqs =3D NULL; entry->max_wqs =3D max_wqs; + entry->n_wqs =3D 0; + entry->cur_wq =3D 0; } =20 pr_debug("initialized wq table\n"); @@ -809,33 +840,27 @@ static int alloc_wq_table(int max_wqs) return 0; } =20 -static void wq_table_add(int cpu, struct idxd_wq *wq) +static void wq_table_add(int cpu, struct wq_table_entry *iaa_local_wqs) { struct wq_table_entry *entry =3D per_cpu_ptr(wq_table, cpu); =20 - if (WARN_ON(entry->n_wqs =3D=3D entry->max_wqs)) - return; - - entry->wqs[entry->n_wqs++] =3D wq; + entry->wqs =3D iaa_local_wqs->wqs; + entry->max_wqs =3D iaa_local_wqs->max_wqs; + entry->n_wqs =3D iaa_local_wqs->n_wqs; + entry->cur_wq =3D 0; =20 - pr_debug("%s: added iaa wq %d.%d to idx %d of cpu %d\n", __func__, + pr_debug("%s: cpu %d: added %d iaa local wqs up to wq %d.%d\n", __func__, + cpu, entry->n_wqs, entry->wqs[entry->n_wqs - 1]->idxd->id, - entry->wqs[entry->n_wqs - 1]->id, entry->n_wqs - 1, cpu); + entry->wqs[entry->n_wqs - 1]->id); } =20 static int wq_table_add_wqs(int iaa, int cpu) { struct iaa_device *iaa_device, *found_device =3D NULL; - int ret =3D 0, cur_iaa =3D 0, n_wqs_added =3D 0; - struct idxd_device *idxd; - struct iaa_wq *iaa_wq; - struct pci_dev *pdev; - struct device *dev; + int ret =3D 0, cur_iaa =3D 0; =20 list_for_each_entry(iaa_device, &iaa_devices, list) { - idxd =3D iaa_device->idxd; - pdev =3D idxd->pdev; - dev =3D &pdev->dev; =20 if (cur_iaa !=3D iaa) { cur_iaa++; @@ -843,7 +868,8 @@ static int wq_table_add_wqs(int iaa, int cpu) } =20 found_device =3D iaa_device; - dev_dbg(dev, "getting wq from iaa_device %d, cur_iaa %d\n", + dev_dbg(&found_device->idxd->pdev->dev, + "getting wq from iaa_device %d, cur_iaa %d\n", found_device->idxd->id, cur_iaa); break; } @@ -858,29 +884,58 @@ static int wq_table_add_wqs(int iaa, int cpu) } cur_iaa =3D 0; =20 - idxd =3D found_device->idxd; - pdev =3D idxd->pdev; - dev =3D &pdev->dev; - dev_dbg(dev, "getting wq from only iaa_device %d, cur_iaa %d\n", + dev_dbg(&found_device->idxd->pdev->dev, + "getting wq from only iaa_device %d, cur_iaa %d\n", found_device->idxd->id, cur_iaa); } =20 - list_for_each_entry(iaa_wq, &found_device->wqs, list) { - wq_table_add(cpu, iaa_wq->wq); - pr_debug("rebalance: added wq for cpu=3D%d: iaa wq %d.%d\n", - cpu, iaa_wq->wq->idxd->id, iaa_wq->wq->id); - n_wqs_added++; + wq_table_add(cpu, found_device->iaa_local_wqs); + +out: + return ret; +} + +static int map_iaa_device_wqs(struct iaa_device *iaa_device) +{ + struct wq_table_entry *local; + int ret =3D 0, n_wqs_added =3D 0; + struct iaa_wq *iaa_wq; + + local =3D iaa_device->iaa_local_wqs; + + list_for_each_entry(iaa_wq, &iaa_device->wqs, list) { + if (iaa_wq->mapped && ++n_wqs_added) + continue; + + pr_debug("iaa_device %px: processing wq %d.%d\n", iaa_device, iaa_device= ->idxd->id, iaa_wq->wq->id); + + if (WARN_ON(local->n_wqs =3D=3D local->max_wqs)) + break; + + local->wqs[local->n_wqs++] =3D iaa_wq->wq; + pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_devic= e->idxd->id, iaa_wq->wq->id); + + iaa_wq->mapped =3D true; + ++n_wqs_added; } =20 - if (!n_wqs_added) { - pr_debug("couldn't find any iaa wqs!\n"); + if (!n_wqs_added && !iaa_device->n_wq) { + pr_debug("iaa_device %d: couldn't find any iaa wqs!\n", iaa_device->idxd= ->id); ret =3D -EINVAL; - goto out; } -out: + return ret; } =20 +static void map_iaa_devices(void) +{ + struct iaa_device *iaa_device; + + list_for_each_entry(iaa_device, &iaa_devices, list) { + BUG_ON(map_iaa_device_wqs(iaa_device)); + } +} + /* * Rebalance the wq table so that given a cpu, it's easy to find the * closest IAA instance. The idea is to try to choose the most @@ -889,48 +944,42 @@ static int wq_table_add_wqs(int iaa, int cpu) */ static void rebalance_wq_table(void) { - const struct cpumask *node_cpus; - int node, cpu, iaa =3D -1; + int cpu, iaa; =20 if (nr_iaa =3D=3D 0) return; =20 - pr_debug("rebalance: nr_nodes=3D%d, nr_cpus %d, nr_iaa %d, cpus_per_iaa %= d\n", - nr_nodes, nr_cpus, nr_iaa, cpus_per_iaa); + map_iaa_devices(); =20 - clear_wq_table(); + pr_debug("rebalance: nr_packages=3D%d, nr_cpus %d, nr_iaa %d, cpus_per_ia= a %d\n", + nr_packages, nr_cpus, nr_iaa, cpus_per_iaa); =20 - if (nr_iaa =3D=3D 1) { - for (cpu =3D 0; cpu < nr_cpus; cpu++) { - if (WARN_ON(wq_table_add_wqs(0, cpu))) { - pr_debug("could not add any wqs for iaa 0 to cpu %d!\n", cpu); - return; - } + for (cpu =3D 0; cpu < nr_cpus; cpu++) { + iaa =3D cpu_to_iaa(cpu); + pr_debug("rebalance: cpu=3D%d iaa=3D%d\n", cpu, iaa); + + if (WARN_ON(iaa =3D=3D -1)) { + pr_debug("rebalance (cpu_to_iaa(%d)) failed!\n", cpu); + return; } =20 - return; + if (WARN_ON(wq_table_add_wqs(iaa, cpu))) { + pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu); + return; + } } =20 - for_each_node_with_cpus(node) { - node_cpus =3D cpumask_of_node(node); - - for (cpu =3D 0; cpu < cpumask_weight(node_cpus); cpu++) { - int node_cpu =3D cpumask_nth(cpu, node_cpus); - - if (WARN_ON(node_cpu >=3D nr_cpu_ids)) { - pr_debug("node_cpu %d doesn't exist!\n", node_cpu); - return; - } - - if ((cpu % cpus_per_iaa) =3D=3D 0) - iaa++; + pr_debug("Finished rebalance local wqs."); +} =20 - if (WARN_ON(wq_table_add_wqs(iaa, node_cpu))) { - pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu); - return; - } - } +static void free_wq_tables(void) +{ + if (wq_table) { + free_percpu(wq_table); + wq_table =3D NULL; } + + pr_debug("freed local wq table\n"); } =20 /*************************************************************** @@ -2347,7 +2396,7 @@ static int iaa_crypto_probe(struct idxd_dev *idxd_dev) free_iaa_wq(idxd_wq_get_private(wq)); err_save: if (first_wq) - free_wq_table(); + free_wq_tables(); err_alloc: mutex_unlock(&iaa_devices_lock); idxd_drv_disable_wq(wq); @@ -2397,7 +2446,9 @@ static void iaa_crypto_remove(struct idxd_dev *idxd_d= ev) =20 if (nr_iaa =3D=3D 0) { iaa_crypto_enabled =3D false; - free_wq_table(); + free_wq_tables(); + BUG_ON(!list_empty(&iaa_devices)); + INIT_LIST_HEAD(&iaa_devices); module_put(THIS_MODULE); =20 pr_info("iaa_crypto now DISABLED\n"); @@ -2423,16 +2474,11 @@ static struct idxd_device_driver iaa_crypto_driver = =3D { static int __init iaa_crypto_init_module(void) { int ret =3D 0; - int node; + INIT_LIST_HEAD(&iaa_devices); =20 nr_cpus =3D num_possible_cpus(); - for_each_node_with_cpus(node) - nr_nodes++; - if (!nr_nodes) { - pr_err("IAA couldn't find any nodes with cpus\n"); - return -ENODEV; - } - nr_cpus_per_node =3D nr_cpus / nr_nodes; + nr_cpus_per_package =3D topology_num_cores_per_package(); + nr_packages =3D topology_max_packages(); =20 if (crypto_has_comp("deflate-generic", 0, 0)) deflate_generic_tfm =3D crypto_alloc_comp("deflate-generic", 0, 0); --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 962B81F0E55; Sat, 21 Dec 2024 06:31:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762690; cv=none; b=CYymxcXo6EV7tM4E0AOj845Gfe7e4F8glEZcjvt1uueIDrM9zbD81cC4sj96KnkyPpbsLnlTT1jmxYDfoJ429g88kE7EgP5/y8mwJjt7+Eslak2SgbJ5OnpUhRgz3gR5iMdzxkoSPhmWsaO5OxqLcoPO6AXEytt9CRNAGJJbXNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762690; c=relaxed/simple; bh=YlzhhM4A69VbmucsjnzFvkXVMn/NOLnMKxMNKmr2SEI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=S/moouMU8u1xjhFhSNHLhzFayoHjXR+ba5KVoT+VqKAJ/bU8Nu6Hd8FVkrjYJ1aI6cUhM946Gi0wTsedfPJN0lM4bkyRuChJ8H7aXetQJ9ZdK1CpvKVlZPQ6FqfsIutolJoihLMCE+48pONLCil+Xz1lwk+ai27+T8afjlW81ug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HRY2fDmi; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HRY2fDmi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762687; x=1766298687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YlzhhM4A69VbmucsjnzFvkXVMn/NOLnMKxMNKmr2SEI=; b=HRY2fDmig4DOX5sviv2zcyftFyPreFvvw1jly2A0VBM6qLw9GUeai7IY Z5AYMt+m1nLFlS7f1zFvZZKLPehFTHEJr+3gHedkVfYQ0NdW5Wn2S97Zr V6sx982uByqOufiT1+1FJi67rhKnSuT7DWZXHolQEcf2HvB7Wu5YfOaXO pf8tFlFk1CU8mMSGMrn92Pd10QLL40jtxdHAbQzq+ZJVmtO3h/fi7sntY hjKg+vexW2qeYKx5qhAdYhC3SkLVggJpUYhy4Ipf7fDS72+NU43RTTFgC MI5sR2wR5b+RUbZRuuSc648waSQFatcrLLDUjd92VlTrW8fSEkUoaU+V1 Q==; X-CSE-ConnectionGUID: sYeIfZ9QR2q5X7gENLjj3A== X-CSE-MsgGUID: oaJ9gvOARTy5DI8iStQYRQ== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021707" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021707" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: OhA+j9FoSUiywuaJWmt60g== X-CSE-MsgGUID: U0JhRZ1FRCSHZwusjIn04g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184603" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 09/12] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package. Date: Fri, 20 Dec 2024 22:31:16 -0800 Message-Id: <20241221063119.29140-10-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This change enables processes running on any logical core on a package to use all the IAA devices enabled on that package for compress jobs. In other words, compressions originating from any process in a package will be distributed in round-robin manner to the available IAA devices on the same package. The main premise behind this change is to make sure that no compress engines on any IAA device are un-utilized/under-utilized/over-utilized. In other words, the compress engines on all IAA devices are considered a global resource for that package, thus maximizing compression throughput. This allows the use of all IAA devices present in a given package for (batched) compressions originating from zswap/zram, from all cores on this package. A new per-cpu "global_wq_table" implements this in the iaa_crypto driver. We can think of the global WQ per IAA as a WQ to which all cores on that package can submit compress jobs. To avail of this feature, the user must configure 2 WQs per IAA in order to enable distribution of compress jobs to multiple IAA devices. Each IAA will have 2 WQs: wq.0 (local WQ): Used for decompress jobs from cores mapped by the cpu_to_iaa() "even balancing of logical cores to IAA devices" algorithm. wq.1 (global WQ): Used for compress jobs from *all* logical cores on that package. The iaa_crypto driver will place all global WQs from all same-package IAA devices in the global_wq_table per cpu on that package. When the driver receives a compress job, it will lookup the "next" global WQ in the cpu's global_wq_table to submit the descriptor. The starting wq in the global_wq_table for each cpu is the global wq associated with the IAA nearest to it, so that we stagger the starting global wq for each process. This results in very uniform usage of all IAAs for compress jobs. Two new driver module parameters are added for this feature: g_wqs_per_iaa (default 0): /sys/bus/dsa/drivers/crypto/g_wqs_per_iaa This represents the number of global WQs that can be configured per IAA device. The recommended setting is 1 to enable the use of this feature once the user configures 2 WQs per IAA using higher level scripts as described in Documentation/driver-api/crypto/iaa/iaa-crypto.rst. g_consec_descs_per_gwq (default 1): /sys/bus/dsa/drivers/crypto/g_consec_descs_per_gwq This represents the number of consecutive compress jobs that will be submitted to the same global WQ (i.e. to the same IAA device) from a given core, before moving to the next global WQ. The default is 1, which is also the recommended setting to avail of this feature. The decompress jobs from any core will be sent to the "local" IAA, namely the one that the driver assigns with the cpu_to_iaa() mapping algorithm that evenly balances the assignment of logical cores to IAA devices on a package. On a 2-package Sapphire Rapids server where each package has 56 cores and 4 IAA devices, this is how the compress/decompress jobs will be mapped when the user configures 2 WQs per IAA device (which implies wq.1 will be added to the global WQ table for each logical core on that package): package(s): 2 package0 CPU(s): 0-55,112-167 package1 CPU(s): 56-111,168-223 Compress jobs: -------------- package 0: iaa_crypto will send compress jobs from all cpus (0-55,112-167) to all IAA devices on the package (iax1/iax3/iax5/iax7) in round-robin manner: iaa: iax1 iax3 iax5 iax7 package 1: iaa_crypto will send compress jobs from all cpus (56-111,168-223) to all IAA devices on the package (iax9/iax11/iax13/iax15) in round-robin manner: iaa: iax9 iax11 iax13 iax15 Decompress jobs: ---------------- package 0: cpu 0-13,112-125 14-27,126-139 28-41,140-153 42-55,154-167 iaa: iax1 iax3 iax5 iax7 package 1: cpu 56-69,168-181 70-83,182-195 84-97,196-209 98-111,210-223 iaa: iax9 iax11 iax13 iax15 Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto.h | 1 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 385 ++++++++++++++++++++- 2 files changed, 378 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/i= aa/iaa_crypto.h index 74d25e62df12..c46c70ecf355 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -91,6 +91,7 @@ struct iaa_device { struct list_head wqs; =20 struct wq_table_entry *iaa_local_wqs; + struct wq_table_entry *iaa_global_wqs; =20 atomic64_t comp_calls; atomic64_t comp_bytes; diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/in= tel/iaa/iaa_crypto_main.c index 418f78454875..4ca9028d6050 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -42,6 +42,18 @@ static struct crypto_comp *deflate_generic_tfm; /* Per-cpu lookup table for balanced wqs */ static struct wq_table_entry __percpu *wq_table =3D NULL; =20 +static struct wq_table_entry **pkg_global_wq_tables =3D NULL; + +/* Per-cpu lookup table for global wqs shared by all cpus. */ +static struct wq_table_entry __percpu *global_wq_table =3D NULL; + +/* + * Per-cpu counter of consecutive descriptors allocated to + * the same wq in the global_wq_table, so that we know + * when to switch to the next wq in the global_wq_table. + */ +static int __percpu *num_consec_descs_per_wq =3D NULL; + /* Verify results of IAA compress or not */ static bool iaa_verify_compress =3D false; =20 @@ -79,6 +91,16 @@ static bool async_mode =3D true; /* Use interrupts */ static bool use_irq; =20 +/* Number of global wqs per iaa*/ +static int g_wqs_per_iaa =3D 0; + +/* + * Number of consecutive descriptors to allocate from a + * given global wq before switching to the next wq in + * the global_wq_table. + */ +static int g_consec_descs_per_gwq =3D 1; + static struct iaa_compression_mode *iaa_compression_modes[IAA_COMP_MODES_M= AX]; =20 LIST_HEAD(iaa_devices); @@ -180,6 +202,60 @@ static ssize_t sync_mode_store(struct device_driver *d= river, } static DRIVER_ATTR_RW(sync_mode); =20 +static ssize_t g_wqs_per_iaa_show(struct device_driver *driver, char *buf) +{ + return sprintf(buf, "%d\n", g_wqs_per_iaa); +} + +static ssize_t g_wqs_per_iaa_store(struct device_driver *driver, + const char *buf, size_t count) +{ + int ret =3D -EBUSY; + + mutex_lock(&iaa_devices_lock); + + if (iaa_crypto_enabled) + goto out; + + ret =3D kstrtoint(buf, 10, &g_wqs_per_iaa); + if (ret) + goto out; + + ret =3D count; +out: + mutex_unlock(&iaa_devices_lock); + + return ret; +} +static DRIVER_ATTR_RW(g_wqs_per_iaa); + +static ssize_t g_consec_descs_per_gwq_show(struct device_driver *driver, c= har *buf) +{ + return sprintf(buf, "%d\n", g_consec_descs_per_gwq); +} + +static ssize_t g_consec_descs_per_gwq_store(struct device_driver *driver, + const char *buf, size_t count) +{ + int ret =3D -EBUSY; + + mutex_lock(&iaa_devices_lock); + + if (iaa_crypto_enabled) + goto out; + + ret =3D kstrtoint(buf, 10, &g_consec_descs_per_gwq); + if (ret) + goto out; + + ret =3D count; +out: + mutex_unlock(&iaa_devices_lock); + + return ret; +} +static DRIVER_ATTR_RW(g_consec_descs_per_gwq); + /**************************** * Driver compression modes. ****************************/ @@ -465,7 +541,7 @@ static void remove_device_compression_modes(struct iaa_= device *iaa_device) ***********************************************************/ static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd) { - struct wq_table_entry *local; + struct wq_table_entry *local, *global; struct iaa_device *iaa_device; =20 iaa_device =3D kzalloc(sizeof(*iaa_device), GFP_KERNEL); @@ -488,6 +564,20 @@ static struct iaa_device *iaa_device_alloc(struct idxd= _device *idxd) local->max_wqs =3D iaa_device->idxd->max_wqs; local->n_wqs =3D 0; =20 + /* IAA device's global wqs. */ + iaa_device->iaa_global_wqs =3D kzalloc(sizeof(struct wq_table_entry), GFP= _KERNEL); + if (!iaa_device->iaa_global_wqs) + goto err; + + global =3D iaa_device->iaa_global_wqs; + + global->wqs =3D kzalloc(iaa_device->idxd->max_wqs * sizeof(struct wq *), = GFP_KERNEL); + if (!global->wqs) + goto err; + + global->max_wqs =3D iaa_device->idxd->max_wqs; + global->n_wqs =3D 0; + INIT_LIST_HEAD(&iaa_device->wqs); =20 return iaa_device; @@ -499,6 +589,8 @@ static struct iaa_device *iaa_device_alloc(struct idxd_= device *idxd) kfree(iaa_device->iaa_local_wqs->wqs); kfree(iaa_device->iaa_local_wqs); } + if (iaa_device->iaa_global_wqs) + kfree(iaa_device->iaa_global_wqs); kfree(iaa_device); } =20 @@ -616,6 +708,12 @@ static void free_iaa_device(struct iaa_device *iaa_dev= ice) kfree(iaa_device->iaa_local_wqs); } =20 + if (iaa_device->iaa_global_wqs) { + if (iaa_device->iaa_global_wqs->wqs) + kfree(iaa_device->iaa_global_wqs->wqs); + kfree(iaa_device->iaa_global_wqs); + } + kfree(iaa_device); } =20 @@ -817,6 +915,58 @@ static inline int cpu_to_iaa(int cpu) return (nr_iaa - 1); } =20 +static void free_global_wq_table(void) +{ + if (global_wq_table) { + free_percpu(global_wq_table); + global_wq_table =3D NULL; + } + + if (num_consec_descs_per_wq) { + free_percpu(num_consec_descs_per_wq); + num_consec_descs_per_wq =3D NULL; + } + + pr_debug("freed global wq table\n"); +} + +static int pkg_global_wq_tables_alloc(void) +{ + int i, j; + + pkg_global_wq_tables =3D kzalloc(nr_packages * sizeof(*pkg_global_wq_tabl= es), GFP_KERNEL); + if (!pkg_global_wq_tables) + return -ENOMEM; + + for (i =3D 0; i < nr_packages; ++i) { + pkg_global_wq_tables[i] =3D kzalloc(sizeof(struct wq_table_entry), GFP_K= ERNEL); + + if (!pkg_global_wq_tables[i]) { + for (j =3D 0; j < i; ++j) + kfree(pkg_global_wq_tables[j]); + kfree(pkg_global_wq_tables); + pkg_global_wq_tables =3D NULL; + return -ENOMEM; + } + pkg_global_wq_tables[i]->wqs =3D NULL; + } + + return 0; +} + +static void pkg_global_wq_tables_dealloc(void) +{ + int i; + + for (i =3D 0; i < nr_packages; ++i) { + if (pkg_global_wq_tables[i]->wqs) + kfree(pkg_global_wq_tables[i]->wqs); + kfree(pkg_global_wq_tables[i]); + } + kfree(pkg_global_wq_tables); + pkg_global_wq_tables =3D NULL; +} + static int alloc_wq_table(int max_wqs) { struct wq_table_entry *entry; @@ -835,6 +985,35 @@ static int alloc_wq_table(int max_wqs) entry->cur_wq =3D 0; } =20 + global_wq_table =3D alloc_percpu(struct wq_table_entry); + if (!global_wq_table) + return 0; + + for (cpu =3D 0; cpu < nr_cpus; cpu++) { + entry =3D per_cpu_ptr(global_wq_table, cpu); + + entry->wqs =3D NULL; + entry->max_wqs =3D max_wqs; + entry->n_wqs =3D 0; + entry->cur_wq =3D 0; + } + + num_consec_descs_per_wq =3D alloc_percpu(int); + if (!num_consec_descs_per_wq) { + free_global_wq_table(); + return 0; + } + + for (cpu =3D 0; cpu < nr_cpus; cpu++) { + int *num_consec_descs =3D per_cpu_ptr(num_consec_descs_per_wq, cpu); + *num_consec_descs =3D 0; + } + + if (pkg_global_wq_tables_alloc()) { + free_global_wq_table(); + return 0; + } + pr_debug("initialized wq table\n"); =20 return 0; @@ -895,13 +1074,120 @@ static int wq_table_add_wqs(int iaa, int cpu) return ret; } =20 +static void pkg_global_wq_tables_reinit(void) +{ + int i, cur_iaa =3D 0, pkg =3D 0, nr_pkg_wqs =3D 0; + struct iaa_device *iaa_device; + struct wq_table_entry *global; + + if (!pkg_global_wq_tables) + return; + + /* Reallocate per-package wqs. */ + list_for_each_entry(iaa_device, &iaa_devices, list) { + global =3D iaa_device->iaa_global_wqs; + nr_pkg_wqs +=3D global->n_wqs; + + if (++cur_iaa =3D=3D nr_iaa_per_package) { + nr_pkg_wqs =3D nr_pkg_wqs ? max_t(int, iaa_device->idxd->max_wqs, nr_pk= g_wqs) : 0; + + if (pkg_global_wq_tables[pkg]->wqs) { + kfree(pkg_global_wq_tables[pkg]->wqs); + pkg_global_wq_tables[pkg]->wqs =3D NULL; + } + + if (nr_pkg_wqs) + pkg_global_wq_tables[pkg]->wqs =3D kzalloc(nr_pkg_wqs * + sizeof(struct wq *), + GFP_KERNEL); + + pkg_global_wq_tables[pkg]->n_wqs =3D 0; + pkg_global_wq_tables[pkg]->cur_wq =3D 0; + pkg_global_wq_tables[pkg]->max_wqs =3D nr_pkg_wqs; + + if (++pkg =3D=3D nr_packages) + break; + cur_iaa =3D 0; + nr_pkg_wqs =3D 0; + } + } + + pkg =3D 0; + cur_iaa =3D 0; + + /* Re-initialize per-package wqs. */ + list_for_each_entry(iaa_device, &iaa_devices, list) { + global =3D iaa_device->iaa_global_wqs; + + if (pkg_global_wq_tables[pkg]->wqs) + for (i =3D 0; i < global->n_wqs; ++i) + pkg_global_wq_tables[pkg]->wqs[pkg_global_wq_tables[pkg]->n_wqs++] =3D= global->wqs[i]; + + pr_debug("pkg_global_wq_tables[%d] has %d wqs", pkg, pkg_global_wq_table= s[pkg]->n_wqs); + + if (++cur_iaa =3D=3D nr_iaa_per_package) { + if (++pkg =3D=3D nr_packages) + break; + cur_iaa =3D 0; + } + } +} + +static void global_wq_table_add(int cpu, struct wq_table_entry *pkg_global= _wq_table) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(global_wq_table, cpu); + + /* This could be NULL. */ + entry->wqs =3D pkg_global_wq_table->wqs; + entry->max_wqs =3D pkg_global_wq_table->max_wqs; + entry->n_wqs =3D pkg_global_wq_table->n_wqs; + entry->cur_wq =3D 0; + + if (entry->wqs) + pr_debug("%s: cpu %d: added %d iaa global wqs up to wq %d.%d\n", __func_= _, + cpu, entry->n_wqs, + entry->wqs[entry->n_wqs - 1]->idxd->id, + entry->wqs[entry->n_wqs - 1]->id); +} + +static void global_wq_table_set_start_wq(int cpu) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(global_wq_table, cpu); + int start_wq =3D g_wqs_per_iaa * (cpu_to_iaa(cpu) % nr_iaa_per_package); + + if ((start_wq >=3D 0) && (start_wq < entry->n_wqs)) + entry->cur_wq =3D start_wq; +} + +static void global_wq_table_add_wqs(void) +{ + int cpu; + + if (!pkg_global_wq_tables) + return; + + for (cpu =3D 0; cpu < nr_cpus; cpu +=3D nr_cpus_per_package) { + /* cpu's on the same package get the same global_wq_table. */ + int package_id =3D topology_logical_package_id(cpu); + int pkg_cpu; + + for (pkg_cpu =3D cpu; pkg_cpu < cpu + nr_cpus_per_package; ++pkg_cpu) { + if (pkg_global_wq_tables[package_id]->n_wqs > 0) { + global_wq_table_add(pkg_cpu, pkg_global_wq_tables[package_id]); + global_wq_table_set_start_wq(pkg_cpu); + } + } + } +} + static int map_iaa_device_wqs(struct iaa_device *iaa_device) { - struct wq_table_entry *local; + struct wq_table_entry *local, *global; int ret =3D 0, n_wqs_added =3D 0; struct iaa_wq *iaa_wq; =20 local =3D iaa_device->iaa_local_wqs; + global =3D iaa_device->iaa_global_wqs; =20 list_for_each_entry(iaa_wq, &iaa_device->wqs, list) { if (iaa_wq->mapped && ++n_wqs_added) @@ -909,11 +1195,18 @@ static int map_iaa_device_wqs(struct iaa_device *iaa= _device) =20 pr_debug("iaa_device %px: processing wq %d.%d\n", iaa_device, iaa_device= ->idxd->id, iaa_wq->wq->id); =20 - if (WARN_ON(local->n_wqs =3D=3D local->max_wqs)) - break; + if ((!n_wqs_added || ((n_wqs_added + g_wqs_per_iaa) < iaa_device->n_wq))= && + (local->n_wqs < local->max_wqs)) { + + local->wqs[local->n_wqs++] =3D iaa_wq->wq; + pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_devi= ce->idxd->id, iaa_wq->wq->id); + } else { + if (WARN_ON(global->n_wqs =3D=3D global->max_wqs)) + break; =20 - local->wqs[local->n_wqs++] =3D iaa_wq->wq; - pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_devic= e->idxd->id, iaa_wq->wq->id); + global->wqs[global->n_wqs++] =3D iaa_wq->wq; + pr_debug("iaa_device %px: added global wq %d.%d\n", iaa_device, iaa_dev= ice->idxd->id, iaa_wq->wq->id); + } =20 iaa_wq->mapped =3D true; ++n_wqs_added; @@ -969,6 +1262,10 @@ static void rebalance_wq_table(void) } } =20 + if (iaa_crypto_enabled && pkg_global_wq_tables) { + pkg_global_wq_tables_reinit(); + global_wq_table_add_wqs(); + } pr_debug("Finished rebalance local wqs."); } =20 @@ -979,7 +1276,17 @@ static void free_wq_tables(void) wq_table =3D NULL; } =20 - pr_debug("freed local wq table\n"); + if (global_wq_table) { + free_percpu(global_wq_table); + global_wq_table =3D NULL; + } + + if (num_consec_descs_per_wq) { + free_percpu(num_consec_descs_per_wq); + num_consec_descs_per_wq =3D NULL; + } + + pr_debug("freed wq tables\n"); } =20 /*************************************************************** @@ -1002,6 +1309,35 @@ static struct idxd_wq *wq_table_next_wq(int cpu) return entry->wqs[entry->cur_wq]; } =20 +/* + * Caller should make sure to call only if the + * per_cpu_ptr "global_wq_table" is non-NULL + * and has at least one wq configured. + */ +static struct idxd_wq *global_wq_table_next_wq(int cpu) +{ + struct wq_table_entry *entry =3D per_cpu_ptr(global_wq_table, cpu); + int *num_consec_descs =3D per_cpu_ptr(num_consec_descs_per_wq, cpu); + + /* + * Fall-back to local IAA's wq if there were no global wqs configured + * for any IAA device, or if there were problems in setting up global + * wqs for this cpu's package. + */ + if (!entry->wqs) + return wq_table_next_wq(cpu); + + if ((*num_consec_descs) =3D=3D g_consec_descs_per_gwq) { + if (++entry->cur_wq >=3D entry->n_wqs) + entry->cur_wq =3D 0; + *num_consec_descs =3D 0; + } + + ++(*num_consec_descs); + + return entry->wqs[entry->cur_wq]; +} + /************************************************* * Core iaa_crypto compress/decompress functions. *************************************************/ @@ -1563,6 +1899,7 @@ static int iaa_comp_acompress(struct acomp_req *req) struct idxd_wq *wq; struct device *dev; int order =3D -1; + struct wq_table_entry *entry; =20 compression_ctx =3D crypto_tfm_ctx(tfm); =20 @@ -1581,8 +1918,15 @@ static int iaa_comp_acompress(struct acomp_req *req) disable_async =3D true; =20 cpu =3D get_cpu(); - wq =3D wq_table_next_wq(cpu); + entry =3D per_cpu_ptr(global_wq_table, cpu); + + if (!entry || !entry->wqs || entry->n_wqs =3D=3D 0) { + wq =3D wq_table_next_wq(cpu); + } else { + wq =3D global_wq_table_next_wq(cpu); + } put_cpu(); + if (!wq) { pr_debug("no wq configured for cpu=3D%d\n", cpu); return -ENODEV; @@ -2446,6 +2790,7 @@ static void iaa_crypto_remove(struct idxd_dev *idxd_d= ev) =20 if (nr_iaa =3D=3D 0) { iaa_crypto_enabled =3D false; + pkg_global_wq_tables_dealloc(); free_wq_tables(); BUG_ON(!list_empty(&iaa_devices)); INIT_LIST_HEAD(&iaa_devices); @@ -2515,6 +2860,20 @@ static int __init iaa_crypto_init_module(void) goto err_sync_attr_create; } =20 + ret =3D driver_create_file(&iaa_crypto_driver.drv, + &driver_attr_g_wqs_per_iaa); + if (ret) { + pr_debug("IAA g_wqs_per_iaa attr creation failed\n"); + goto err_g_wqs_per_iaa_attr_create; + } + + ret =3D driver_create_file(&iaa_crypto_driver.drv, + &driver_attr_g_consec_descs_per_gwq); + if (ret) { + pr_debug("IAA g_consec_descs_per_gwq attr creation failed\n"); + goto err_g_consec_descs_per_gwq_attr_create; + } + if (iaa_crypto_debugfs_init()) pr_warn("debugfs init failed, stats not available\n"); =20 @@ -2522,6 +2881,12 @@ static int __init iaa_crypto_init_module(void) out: return ret; =20 +err_g_consec_descs_per_gwq_attr_create: + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_g_wqs_per_iaa); +err_g_wqs_per_iaa_attr_create: + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_sync_mode); err_sync_attr_create: driver_remove_file(&iaa_crypto_driver.drv, &driver_attr_verify_compress); @@ -2545,6 +2910,10 @@ static void __exit iaa_crypto_cleanup_module(void) &driver_attr_sync_mode); driver_remove_file(&iaa_crypto_driver.drv, &driver_attr_verify_compress); + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_g_wqs_per_iaa); + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_g_consec_descs_per_gwq); idxd_driver_unregister(&iaa_crypto_driver); iaa_aecs_cleanup_fixed(); crypto_free_comp(deflate_generic_tfm); --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DC841F2377; Sat, 21 Dec 2024 06:31:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762691; cv=none; b=eXDbnYhw3w2BwIvVNJlX5xDrlXuz5HBZbRM8xtQmsyQgBrkxJn8fkvZM647pCWq6dKQhDzPpZmYQiLrcoy/ot0UUru/XCq8yMP/axDKgoF8sxFdtToM254SnTDN1tYL/MwtqB5f/aCPr5JgLwTlU/ffkuJnMrVBZHYXr12wzNAo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762691; c=relaxed/simple; bh=KPv/lIOr9UJuu1TDwAGjapkWxba0uS6X9GnVkLeTgfc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WCdWCwAemrGDSqUxnzRb7De1hgnvS6LmgcU1JGVFMveSkpwd38yiDyG4D6QZPXbEA5DHmYOpyanifalHR+E8sl/5bWY0GcOpS3MetWQeO6QZUSqIDRD254vUblMOg2eS/ZksxXEcE+a4+vcriagnWB2F9ppxRBhvFSPyrWeVy4w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TlnmFjtn; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TlnmFjtn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762689; x=1766298689; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KPv/lIOr9UJuu1TDwAGjapkWxba0uS6X9GnVkLeTgfc=; b=TlnmFjtn8LsdEnX4WjcjjREbR4uvbd7UOp9bosxA/YpbJLz/llgpJVF/ h98TnDv5MPouF48G8o9/PUuVW3lLq3Pzkf3pU5/Pswjt25VsJG1eVHhhb QlLjqHtS3KdVwMnHjoMTw7Uqd59VLvcFTJU2IVsEYf5Mu6F2kC1AN/9Yv jrQYNiuHEkyFteADCPNT5I8P2kGHsquiQfx8dALIYJEbDyV9DcCv1m6sy UDolzYlr8OtIwTowbu7sH8ToMIvoI4Y25sQKyLJPxdmKtlljJGKJekOm5 TrIVZOaT9rpwupf5yFJSQRlARQDeteGYNNJwlxpjhmtVR0x1tL1ke7Z4l A==; X-CSE-ConnectionGUID: NrgutftuTUS3Fyy/QVOZow== X-CSE-MsgGUID: vAdh1tWIQPKAdPiI7pgDOg== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021719" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021719" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:21 -0800 X-CSE-ConnectionGUID: 7G/EcktGThmli+QGpElkYQ== X-CSE-MsgGUID: M7yZhvoTT6i5+xSQGaVIsA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184607" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 10/12] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching. Date: Fri, 20 Dec 2024 22:31:17 -0800 Message-Id: <20241221063119.29140-11-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch does the following: 1) Defines ZSWAP_MAX_BATCH_SIZE to denote the maximum number of acomp_ctx batching resources (acomp_reqs and buffers) to allocate if the zswap compressor supports batching. Currently, ZSWAP_MAX_BATCH_SIZE is set to 8U. 2) Modifies the definition of "struct crypto_acomp_ctx" to represent a configurable number of acomp_reqs and buffers. Adds a "nr_reqs" to "struct crypto_acomp_ctx" to contain the number of resources that will be allocated in the cpu hotplug onlining code. 3) The zswap_cpu_comp_prepare() cpu onlining code will detect if the crypto_acomp created for the zswap pool (in other words, the zswap compression algorithm) has registered implementations for batch_compress() and batch_decompress(). If so, it will query the crypto_acomp for the maximum batch size supported by the compressor, and set "nr_reqs" to the minimum of this compressor-specific max batch size and ZSWAP_MAX_BATCH_SIZE. Finally, it will allocate "nr_reqs" reqs/buffers, and set the acomp_ctx->nr_reqs accordingly. 4) If the crypto_acomp does not support batching, "nr_reqs" defaults to 1. Signed-off-by: Kanchana P Sridhar --- mm/zswap.c | 122 +++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 90 insertions(+), 32 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 9718c33f8192..99cd78891fd0 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -78,6 +78,13 @@ static bool zswap_pool_reached_full; =20 #define ZSWAP_PARAM_UNSET "" =20 +/* + * For compression batching of large folios: + * Maximum number of acomp compress requests that will be processed + * in a batch, iff the zswap compressor supports batching. + */ +#define ZSWAP_MAX_BATCH_SIZE 8U + static int zswap_setup(void); =20 /* Enable/disable zswap */ @@ -143,9 +150,10 @@ bool zswap_never_enabled(void) =20 struct crypto_acomp_ctx { struct crypto_acomp *acomp; - struct acomp_req *req; + struct acomp_req **reqs; + u8 **buffers; + unsigned int nr_reqs; struct crypto_wait wait; - u8 *buffer; struct mutex mutex; bool is_sleepable; }; @@ -818,49 +826,88 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, s= truct hlist_node *node) struct zswap_pool *pool =3D hlist_entry(node, struct zswap_pool, node); struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->acomp_ctx, cpu); struct crypto_acomp *acomp; - struct acomp_req *req; - int ret; + unsigned int nr_reqs =3D 1; + int ret =3D -ENOMEM; + int i, j; =20 mutex_init(&acomp_ctx->mutex); - - acomp_ctx->buffer =3D kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node= (cpu)); - if (!acomp_ctx->buffer) - return -ENOMEM; + acomp_ctx->nr_reqs =3D 0; =20 acomp =3D crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu)); if (IS_ERR(acomp)) { pr_err("could not alloc crypto acomp %s : %ld\n", pool->tfm_name, PTR_ERR(acomp)); - ret =3D PTR_ERR(acomp); - goto acomp_fail; + return PTR_ERR(acomp); } acomp_ctx->acomp =3D acomp; acomp_ctx->is_sleepable =3D acomp_is_async(acomp); =20 - req =3D acomp_request_alloc(acomp_ctx->acomp); - if (!req) { - pr_err("could not alloc crypto acomp_request %s\n", - pool->tfm_name); - ret =3D -ENOMEM; + /* + * Create the necessary batching resources if the crypto acomp alg + * implements the batch_compress and batch_decompress API. + */ + if (acomp_has_async_batching(acomp)) { + nr_reqs =3D min(ZSWAP_MAX_BATCH_SIZE, crypto_acomp_batch_size(acomp)); + pr_info_once("Creating acomp_ctx with %d reqs/buffers for batching since= crypto acomp\n%s has registered batch_compress() and batch_decompress().\n= ", + nr_reqs, pool->tfm_name); + } + + acomp_ctx->buffers =3D kmalloc_node(nr_reqs * sizeof(u8 *), GFP_KERNEL, c= pu_to_node(cpu)); + if (!acomp_ctx->buffers) + goto buf_fail; + + for (i =3D 0; i < nr_reqs; ++i) { + acomp_ctx->buffers[i] =3D kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to= _node(cpu)); + if (!acomp_ctx->buffers[i]) { + for (j =3D 0; j < i; ++j) + kfree(acomp_ctx->buffers[j]); + kfree(acomp_ctx->buffers); + ret =3D -ENOMEM; + goto buf_fail; + } + } + + acomp_ctx->reqs =3D kmalloc_node(nr_reqs * sizeof(struct acomp_req *), GF= P_KERNEL, cpu_to_node(cpu)); + if (!acomp_ctx->reqs) goto req_fail; + + for (i =3D 0; i < nr_reqs; ++i) { + acomp_ctx->reqs[i] =3D acomp_request_alloc(acomp_ctx->acomp); + if (!acomp_ctx->reqs[i]) { + pr_err("could not alloc crypto acomp_request reqs[%d] %s\n", + i, pool->tfm_name); + for (j =3D 0; j < i; ++j) + acomp_request_free(acomp_ctx->reqs[j]); + kfree(acomp_ctx->reqs); + ret =3D -ENOMEM; + goto req_fail; + } } - acomp_ctx->req =3D req; =20 + /* + * The crypto_wait is used only in fully synchronous, i.e., with scomp + * or non-poll mode of acomp, hence there is only one "wait" per + * acomp_ctx, with callback set to reqs[0], under the assumption that + * there is at least 1 request per acomp_ctx. + */ crypto_init_wait(&acomp_ctx->wait); /* * if the backend of acomp is async zip, crypto_req_done() will wakeup * crypto_wait_req(); if the backend of acomp is scomp, the callback * won't be called, crypto_wait_req() will return without blocking. */ - acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, + acomp_request_set_callback(acomp_ctx->reqs[0], CRYPTO_TFM_REQ_MAY_BACKLOG, crypto_req_done, &acomp_ctx->wait); =20 + acomp_ctx->nr_reqs =3D nr_reqs; return 0; =20 req_fail: + for (i =3D 0; i < nr_reqs; ++i) + kfree(acomp_ctx->buffers[i]); + kfree(acomp_ctx->buffers); +buf_fail: crypto_free_acomp(acomp_ctx->acomp); -acomp_fail: - kfree(acomp_ctx->buffer); return ret; } =20 @@ -870,11 +917,22 @@ static int zswap_cpu_comp_dead(unsigned int cpu, stru= ct hlist_node *node) struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->acomp_ctx, cpu); =20 if (!IS_ERR_OR_NULL(acomp_ctx)) { - if (!IS_ERR_OR_NULL(acomp_ctx->req)) - acomp_request_free(acomp_ctx->req); + int i; + + for (i =3D 0; i < acomp_ctx->nr_reqs; ++i) + if (!IS_ERR_OR_NULL(acomp_ctx->reqs[i])) + acomp_request_free(acomp_ctx->reqs[i]); + kfree(acomp_ctx->reqs); + + for (i =3D 0; i < acomp_ctx->nr_reqs; ++i) + kfree(acomp_ctx->buffers[i]); + kfree(acomp_ctx->buffers); + if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) crypto_free_acomp(acomp_ctx->acomp); - kfree(acomp_ctx->buffer); + + acomp_ctx->nr_reqs =3D 0; + acomp_ctx =3D NULL; } =20 return 0; @@ -897,7 +955,7 @@ static bool zswap_compress(struct page *page, struct zs= wap_entry *entry, =20 mutex_lock(&acomp_ctx->mutex); =20 - dst =3D acomp_ctx->buffer; + dst =3D acomp_ctx->buffers[0]; sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); =20 @@ -907,7 +965,7 @@ static bool zswap_compress(struct page *page, struct zs= wap_entry *entry, * giving the dst buffer with enough length to avoid buffer overflow. */ sg_init_one(&output, dst, PAGE_SIZE * 2); - acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen= ); + acomp_request_set_params(acomp_ctx->reqs[0], &input, &output, PAGE_SIZE, = dlen); =20 /* * it maybe looks a little bit silly that we send an asynchronous request, @@ -921,8 +979,8 @@ static bool zswap_compress(struct page *page, struct zs= wap_entry *entry, * but in different threads running on different cpu, we have different * acomp instance, so multiple threads can do (de)compression in parallel. */ - comp_ret =3D crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acom= p_ctx->wait); - dlen =3D acomp_ctx->req->dlen; + comp_ret =3D crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &= acomp_ctx->wait); + dlen =3D acomp_ctx->reqs[0]->dlen; if (comp_ret) goto unlock; =20 @@ -975,20 +1033,20 @@ static void zswap_decompress(struct zswap_entry *ent= ry, struct folio *folio) */ if ((acomp_ctx->is_sleepable && !zpool_can_sleep_mapped(zpool)) || !virt_addr_valid(src)) { - memcpy(acomp_ctx->buffer, src, entry->length); - src =3D acomp_ctx->buffer; + memcpy(acomp_ctx->buffers[0], src, entry->length); + src =3D acomp_ctx->buffers[0]; zpool_unmap_handle(zpool, entry->handle); } =20 sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_folio(&output, folio, PAGE_SIZE, 0); - acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, = PAGE_SIZE); - BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ct= x->wait)); - BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); + acomp_request_set_params(acomp_ctx->reqs[0], &input, &output, entry->leng= th, PAGE_SIZE); + BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->reqs[0]), &acom= p_ctx->wait)); + BUG_ON(acomp_ctx->reqs[0]->dlen !=3D PAGE_SIZE); mutex_unlock(&acomp_ctx->mutex); =20 - if (src !=3D acomp_ctx->buffer) + if (src !=3D acomp_ctx->buffers[0]) zpool_unmap_handle(zpool, entry->handle); } =20 --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 618001F2384; Sat, 21 Dec 2024 06:31:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762691; cv=none; b=Ba6IaZvhCQal8sqOPv52/h0Fk9qoBBa/vuaUfEH8sK0n2ejimelwg2miGfb0gQZoAGMxufeuJK9pHPTjfIFk61lzKMoF3G5dNfnk8to8Q5F5yK7LRTzSL1v9NvmgJiz6r/TrmkoJ3HWw9lwa4YPaRSdOMQBczpjsNrG46ZeUwJ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762691; c=relaxed/simple; bh=3T5mE0tmACfbWWCJAYrQoPctDfhArcrXCQeauBvfdDw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=TbeeVvQLqSP7KXJaxmVLetmMgSI/UGeEW/CtgygM2Cx2AUJGuImOD33+kYn3R1zzO41DcxVOej5HyNZsvOF7QMnrCFwnpbT1VkCewephj7oDR0L8bm4cdDgut2Y7yrStOZlQvA/E60GpD7fzzYRncYYj3OgmdZF3o5QAKpGv8AY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=D5qTbx52; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="D5qTbx52" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762689; x=1766298689; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3T5mE0tmACfbWWCJAYrQoPctDfhArcrXCQeauBvfdDw=; b=D5qTbx52gHZ024DmSULvHWzbQGk66BxHS1dBSco2hGIHBkjON4F4k5iS IZFnE38aaHkZ5iFqJdSBGX9kHNyAZ7moF8j95lJGnUyR6ykrRjUywOay9 37Lrg0+XdYI2upVmb6nEHCoHCsTYF4cyPk7uWJxloLdHQOSCHl6WhXHlE vJiIDv7om9Ax314y8KyOhmBfsq5Xtg7YE4SuN89gH5Bk17BrjOf0oe+Dm IpAd+GAR4lDXZPhYtcBtxSGP6bzaPOlJJYQQdesMNQyde3mNrBfb8MaA5 r4pK7c2fNeQE/ptAUFl56oF6IpVV9+XRuZdD3X34Q44xhaMGxFSpUnkWC Q==; X-CSE-ConnectionGUID: UMPZT9TuRrq486ug5ZJznQ== X-CSE-MsgGUID: CAZJmblNTZ2ioOvkVcOhHA== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021731" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021731" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:21 -0800 X-CSE-ConnectionGUID: 8ICHdYlyQDKSYSh1fVmI1Q== X-CSE-MsgGUID: AiI31hJNT9G273n9bOuR5g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184610" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 11/12] mm: zswap: Restructure & simplify zswap_store() to make it amenable for batching. Date: Fri, 20 Dec 2024 22:31:18 -0800 Message-Id: <20241221063119.29140-12-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch introduces zswap_store_folio() that implements all the computes done earlier in zswap_store_page() for a single-page, for all the pages in a folio. This allows us to move the loop over the folio's pages from zswap_store() to zswap_store_folio(). A distinct zswap_compress_folio() is also added, that simply calls zswap_compress() for each page in the folio it is called with. zswap_store_folio() starts by allocating all zswap entries required to store the folio. Next, it calls zswap_compress_folio() and finally, adds the entries to the xarray and LRU. The error handling and cleanup required for all failure scenarios that can occur while storing a folio in zswap is now consolidated to a "store_folio_failed" label in zswap_store_folio(). These changes facilitate developing support for compress batching in zswap_store_folio(). Signed-off-by: Kanchana P Sridhar --- mm/zswap.c | 183 +++++++++++++++++++++++++++++++++-------------------- 1 file changed, 116 insertions(+), 67 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 99cd78891fd0..1be0f1807bfc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1467,77 +1467,129 @@ static void shrink_worker(struct work_struct *w) * main API **********************************/ =20 -static ssize_t zswap_store_page(struct page *page, - struct obj_cgroup *objcg, - struct zswap_pool *pool) +static bool zswap_compress_folio(struct folio *folio, + struct zswap_entry *entries[], + struct zswap_pool *pool) { - swp_entry_t page_swpentry =3D page_swap_entry(page); - struct zswap_entry *entry, *old; + long index, nr_pages =3D folio_nr_pages(folio); =20 - /* allocate entry */ - entry =3D zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); - if (!entry) { - zswap_reject_kmemcache_fail++; - return -EINVAL; + for (index =3D 0; index < nr_pages; ++index) { + struct page *page =3D folio_page(folio, index); + + if (!zswap_compress(page, entries[index], pool)) + return false; } =20 - if (!zswap_compress(page, entry, pool)) - goto compress_failed; + return true; +} =20 - old =3D xa_store(swap_zswap_tree(page_swpentry), - swp_offset(page_swpentry), - entry, GFP_KERNEL); - if (xa_is_err(old)) { - int err =3D xa_err(old); +/* + * Store all pages in a folio. + * + * The error handling from all failure points is consolidated to the + * "store_folio_failed" label, based on the initialization of the zswap en= tries' + * handles to ERR_PTR(-EINVAL) at allocation time, and the fact that the + * entry's handle is subsequently modified only upon a successful zpool_ma= lloc() + * after the page is compressed. + */ +static ssize_t zswap_store_folio(struct folio *folio, + struct obj_cgroup *objcg, + struct zswap_pool *pool) +{ + long index, nr_pages =3D folio_nr_pages(folio); + struct zswap_entry **entries =3D NULL; + int node_id =3D folio_nid(folio); + size_t compressed_bytes =3D 0; =20 - WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray error: %d\n", err); - zswap_reject_alloc_fail++; - goto store_failed; + entries =3D kmalloc(nr_pages * sizeof(*entries), GFP_KERNEL); + if (!entries) + return -ENOMEM; + + /* allocate entries */ + for (index =3D 0; index < nr_pages; ++index) { + entries[index] =3D zswap_entry_cache_alloc(GFP_KERNEL, node_id); + + if (!entries[index]) { + zswap_reject_kmemcache_fail++; + nr_pages =3D index; + goto store_folio_failed; + } + + entries[index]->handle =3D (unsigned long)ERR_PTR(-EINVAL); } =20 - /* - * We may have had an existing entry that became stale when - * the folio was redirtied and now the new version is being - * swapped out. Get rid of the old. - */ - if (old) - zswap_entry_free(old); + if (!zswap_compress_folio(folio, entries, pool)) + goto store_folio_failed; =20 - /* - * The entry is successfully compressed and stored in the tree, there is - * no further possibility of failure. Grab refs to the pool and objcg. - * These refs will be dropped by zswap_entry_free() when the entry is - * removed from the tree. - */ - zswap_pool_get(pool); - if (objcg) - obj_cgroup_get(objcg); + for (index =3D 0; index < nr_pages; ++index) { + swp_entry_t page_swpentry =3D page_swap_entry(folio_page(folio, index)); + struct zswap_entry *old, *entry =3D entries[index]; + + old =3D xa_store(swap_zswap_tree(page_swpentry), + swp_offset(page_swpentry), + entry, GFP_KERNEL); + if (xa_is_err(old)) { + int err =3D xa_err(old); + + WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray error: %d\n", err); + zswap_reject_alloc_fail++; + goto store_folio_failed; + } =20 - /* - * We finish initializing the entry while it's already in xarray. - * This is safe because: - * - * 1. Concurrent stores and invalidations are excluded by folio lock. - * - * 2. Writeback is excluded by the entry not being on the LRU yet. - * The publishing order matters to prevent writeback from seeing - * an incoherent entry. - */ - entry->pool =3D pool; - entry->swpentry =3D page_swpentry; - entry->objcg =3D objcg; - entry->referenced =3D true; - if (entry->length) { - INIT_LIST_HEAD(&entry->lru); - zswap_lru_add(&zswap_list_lru, entry); + /* + * We may have had an existing entry that became stale when + * the folio was redirtied and now the new version is being + * swapped out. Get rid of the old. + */ + if (old) + zswap_entry_free(old); + + /* + * The entry is successfully compressed and stored in the tree, there is + * no further possibility of failure. Grab refs to the pool and objcg. + * These refs will be dropped by zswap_entry_free() when the entry is + * removed from the tree. + */ + zswap_pool_get(pool); + if (objcg) + obj_cgroup_get(objcg); + + /* + * We finish initializing the entry while it's already in xarray. + * This is safe because: + * + * 1. Concurrent stores and invalidations are excluded by folio lock. + * + * 2. Writeback is excluded by the entry not being on the LRU yet. + * The publishing order matters to prevent writeback from seeing + * an incoherent entry. + */ + entry->pool =3D pool; + entry->swpentry =3D page_swpentry; + entry->objcg =3D objcg; + entry->referenced =3D true; + if (entry->length) { + INIT_LIST_HEAD(&entry->lru); + zswap_lru_add(&zswap_list_lru, entry); + } + + compressed_bytes +=3D entry->length; } =20 - return entry->length; + kfree(entries); + + return compressed_bytes; + +store_folio_failed: + for (index =3D 0; index < nr_pages; ++index) { + if (!IS_ERR_VALUE(entries[index]->handle)) + zpool_free(pool->zpool, entries[index]->handle); + + zswap_entry_cache_free(entries[index]); + } + + kfree(entries); =20 -store_failed: - zpool_free(pool->zpool, entry->handle); -compress_failed: - zswap_entry_cache_free(entry); return -EINVAL; } =20 @@ -1549,8 +1601,8 @@ bool zswap_store(struct folio *folio) struct mem_cgroup *memcg =3D NULL; struct zswap_pool *pool; size_t compressed_bytes =3D 0; + ssize_t bytes; bool ret =3D false; - long index; =20 VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1584,15 +1636,11 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } =20 - for (index =3D 0; index < nr_pages; ++index) { - struct page *page =3D folio_page(folio, index); - ssize_t bytes; + bytes =3D zswap_store_folio(folio, objcg, pool); + if (bytes < 0) + goto put_pool; =20 - bytes =3D zswap_store_page(page, objcg, pool); - if (bytes < 0) - goto put_pool; - compressed_bytes +=3D bytes; - } + compressed_bytes =3D bytes; =20 if (objcg) { obj_cgroup_charge_zswap(objcg, compressed_bytes); @@ -1622,6 +1670,7 @@ bool zswap_store(struct folio *folio) pgoff_t offset =3D swp_offset(swp); struct zswap_entry *entry; struct xarray *tree; + long index; =20 for (index =3D 0; index < nr_pages; ++index) { tree =3D swap_zswap_tree(swp_entry(type, offset + index)); --=20 2.27.0 From nobody Thu Jan 2 14:56:04 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70E6B1F2C51; Sat, 21 Dec 2024 06:31:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762692; cv=none; b=MsH+3Z4Hg7HttcEzGas2xZXFTdYZiwTLd6KI3q3VpEGkNIZX4yUaIMNa0BJV6RoPUg4pBseItsKrPvXRPVnGXhaymlv4rcFJ9pg29t5HVevCFKrvQi4mSMicQe1V+NLsn5mzt/HTFyWQNzXMg8f1SOKd7N02GHUZmAWVUuZikhE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762692; c=relaxed/simple; bh=OMiXITVf1zg7CemO/j63XZqQRFVNfoKOofrwr8FdVdM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=p6CYp5bklxJ5XBSzi3KwSdODPO0C+dty0zlLPJSPX0nmU7R2tfwhx7IuCq3/fjNSM2dizpfiyUY4jV8dKZp2iTm7msDXzuO+qKRaO9BdlltyEz1Y8KBwocpz7voD/RQVVl2mWpXIB6Xl5bGWTfJ3LxH/37CFEJo7odeA//BNtbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YKOdtHZR; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YKOdtHZR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762690; x=1766298690; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OMiXITVf1zg7CemO/j63XZqQRFVNfoKOofrwr8FdVdM=; b=YKOdtHZRhsbW0msLWY6OzlVf3hSiDtLTBKR7X6dMdQe0l0JYbulxsFs7 E1SrS/S48xy+leEPPKaG+F6BUzezt5Z+y2blAff6gg/nzdlRGbrKrVqWo mdQkXvMWPtRukXAE0fr8JcrLvITW/DIFTofV3SmZb858PjrdwOnMi6xDP pVMD2eySylcUU1fCSmtKtj8dVuIaRffLY5W0FXmptADs9NJFvufm/1zEY +JVw+WzSvoOfa4f5Bw60bC5SIDWarRMMKbSKFjG1T45jzAO5Vt1/jQ0WF abb/pwdH3t258Ast83LoeDGKbooDbmmyQB+DLkpNO/EI0JWExlZq/bnmf g==; X-CSE-ConnectionGUID: wTyED2RPSUSxrJAwLPBUQg== X-CSE-MsgGUID: dicAFlLRStaXz5FUALnQ0Q== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021743" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021743" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:21 -0800 X-CSE-ConnectionGUID: 0Yaxj/qcRrSccQi1sf6c2A== X-CSE-MsgGUID: 1hw2SbHQRge/3biOVZBL3w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184613" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 12/12] mm: zswap: Compress batching with Intel IAA in zswap_store() of large folios. Date: Fri, 20 Dec 2024 22:31:19 -0800 Message-Id: <20241221063119.29140-13-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" zswap_compress_folio() is modified to detect if the pool's acomp_ctx has more than one "nr_reqs", which will be the case if the cpu onlining code has allocated batching resources in the acomp_ctx based on the queries to acomp_has_async_batching() and crypto_acomp_batch_size(). If multiple "nr_reqs" are available in the acomp_ctx, it means compress batching can be used with a batch-size of "acomp_ctx->nr_reqs". If compress batching can be used with the given zswap pool, zswap_compress_folio() will invoke the newly added zswap_batch_compress() procedure to compress and store the folio in batches of "acomp_ctx->nr_reqs" pages. The batch size is effectively "acomp_ctx->nr_reqs". zswap_batch_compress() calls crypto_acomp_batch_compress() to compress each batch of (up to) "acomp_ctx->nr_reqs" pages. The iaa_crypto driver will compress each batch of pages in parallel in the Intel IAA hardware with 'async' mode and request chaining. Hence, zswap_batch_compress() does the same computes for a batch, as zswap_compress() does for a page; and returns true if the batch was successfully compressed/stored, and false otherwise. If the pool does not support compress batching, zswap_compress_folio() calls zswap_compress() for each individual page in the folio, as before. Signed-off-by: Kanchana P Sridhar --- mm/zswap.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 105 insertions(+), 4 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 1be0f1807bfc..f336fafe24c4 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1467,17 +1467,118 @@ static void shrink_worker(struct work_struct *w) * main API **********************************/ =20 +static bool zswap_batch_compress(struct folio *folio, + long index, + unsigned int batch_size, + struct zswap_entry *entries[], + struct zswap_pool *pool, + struct crypto_acomp_ctx *acomp_ctx) +{ + int comp_errors[ZSWAP_MAX_BATCH_SIZE] =3D { 0 }; + unsigned int dlens[ZSWAP_MAX_BATCH_SIZE]; + struct page *pages[ZSWAP_MAX_BATCH_SIZE]; + unsigned int i, nr_batch_pages; + bool ret =3D true; + + nr_batch_pages =3D min((unsigned int)(folio_nr_pages(folio) - index), bat= ch_size); + + for (i =3D 0; i < nr_batch_pages; ++i) { + pages[i] =3D folio_page(folio, index + i); + dlens[i] =3D PAGE_SIZE; + } + + mutex_lock(&acomp_ctx->mutex); + + /* + * Batch compress @nr_batch_pages. If IAA is the compressor, the + * hardware will compress @nr_batch_pages in parallel. + */ + ret =3D crypto_acomp_batch_compress( + acomp_ctx->reqs, + &acomp_ctx->wait, + pages, + acomp_ctx->buffers, + dlens, + comp_errors, + nr_batch_pages); + + if (ret) { + /* + * All batch pages were successfully compressed. + * Store the pages in zpool. + */ + struct zpool *zpool =3D pool->zpool; + gfp_t gfp =3D __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; + + if (zpool_malloc_support_movable(zpool)) + gfp |=3D __GFP_HIGHMEM | __GFP_MOVABLE; + + for (i =3D 0; i < nr_batch_pages; ++i) { + unsigned long handle; + char *buf; + int err; + + err =3D zpool_malloc(zpool, dlens[i], gfp, &handle); + + if (err) { + if (err =3D=3D -ENOSPC) + zswap_reject_compress_poor++; + else + zswap_reject_alloc_fail++; + + ret =3D false; + break; + } + + buf =3D zpool_map_handle(zpool, handle, ZPOOL_MM_WO); + memcpy(buf, acomp_ctx->buffers[i], dlens[i]); + zpool_unmap_handle(zpool, handle); + + entries[i]->handle =3D handle; + entries[i]->length =3D dlens[i]; + } + } else { + /* Some batch pages had compression errors. */ + for (i =3D 0; i < nr_batch_pages; ++i) { + if (comp_errors[i]) { + if (comp_errors[i] =3D=3D -ENOSPC) + zswap_reject_compress_poor++; + else + zswap_reject_compress_fail++; + } + } + } + + mutex_unlock(&acomp_ctx->mutex); + + return ret; +} + static bool zswap_compress_folio(struct folio *folio, struct zswap_entry *entries[], struct zswap_pool *pool) { long index, nr_pages =3D folio_nr_pages(folio); + struct crypto_acomp_ctx *acomp_ctx; + unsigned int batch_size; =20 - for (index =3D 0; index < nr_pages; ++index) { - struct page *page =3D folio_page(folio, index); + acomp_ctx =3D raw_cpu_ptr(pool->acomp_ctx); + batch_size =3D acomp_ctx->nr_reqs; =20 - if (!zswap_compress(page, entries[index], pool)) - return false; + if ((batch_size > 1) && (nr_pages > 1)) { + for (index =3D 0; index < nr_pages; index +=3D batch_size) { + + if (!zswap_batch_compress(folio, index, batch_size, + &entries[index], pool, acomp_ctx)) + return false; + } + } else { + for (index =3D 0; index < nr_pages; ++index) { + struct page *page =3D folio_page(folio, index); + + if (!zswap_compress(page, entries[index], pool)) + return false; + } } =20 return true; --=20 2.27.0