crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works

Herbert Xu posted 1 patch 6 months, 3 weeks ago
crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Herbert Xu 6 months, 3 weeks ago
On Wed, May 21, 2025 at 03:58:47PM +0200, Corentin Labbe wrote:
>
> It still fail:
> http://kernel.montjoie.ovh/479319.log

I think we've made progress.  All the simple hashes have passed
and now we're only failing on hmac.  So it looks it was the
coherent memory being incoherent.

Adding the mvebu maintainers to see if they can shed any light
on why memory returned by dma_alloc_coherent results in DMA
corruption but kmalloc + dma_map_single works correctly.

Also adding Christophe Hellwig as he was the last person to touch
mach-mvebu/coherency.c.

The code in question is in drivers/crypto/marvell/cesa/hash.c
mv_cesa_ahash_dma_add_cache:

	/* This just calls dma_pool_alloc. */
	ret = mv_cesa_ahash_dma_alloc_cache(ahashdreq, flags);
	if (ret)
		return ret;

	memcpy(ahashdreq->cache, creq->cache, creq->cache_ptr);
	/* Pass ahashdreq->cache_dma to hardware. */

At this point it appears that the DMA memory is corrupted and the
hardware returns a bogus hash.  We have tried replacing dma_pool_alloc
with kmalloc + dma_map_single and it works correctly.

Corentin, could you also try the wmb patch and see if that works
without the dma_map_single patch?

https://lore.kernel.org/linux-crypto/aC1fY6IP-8MzVIbx@gondor.apana.org.au/

> but I have still all old patch of you stacked, perhaps could you do a branch somewhere to be sure ?
> current state is: http://kernel.montjoie.ovh/cesa.diff

You can drop everything except the last patch + the printk
patches.

So here is the latest debugging patch with dma_map_single on top
of cryptodev.  Note that the partial hash mismatch code is buggy
but it doesn't matter because it still prints enough info for us
to interpret.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..5c46cd267789 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -49,8 +49,7 @@ mv_cesa_ahash_req_iter_next_op(struct mv_cesa_ahash_dma_iter *iter)
 static inline int
 mv_cesa_ahash_dma_alloc_cache(struct mv_cesa_ahash_dma_req *req, gfp_t flags)
 {
-	req->cache = dma_pool_alloc(cesa_dev->dma->cache_pool, flags,
-				    &req->cache_dma);
+	req->cache = kmalloc(CESA_MAX_HASH_BLOCK_SIZE, flags);
 	if (!req->cache)
 		return -ENOMEM;
 
@@ -63,8 +62,8 @@ mv_cesa_ahash_dma_free_cache(struct mv_cesa_ahash_dma_req *req)
 	if (!req->cache)
 		return;
 
-	dma_pool_free(cesa_dev->dma->cache_pool, req->cache,
-		      req->cache_dma);
+	dma_unmap_single(cesa_dev->dev, req->cache_dma, CESA_MAX_HASH_BLOCK_SIZE, DMA_TO_DEVICE);
+	kfree(req->cache);
 }
 
 static int mv_cesa_ahash_dma_alloc_padding(struct mv_cesa_ahash_dma_req *req,
@@ -533,6 +532,13 @@ mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
 
 	memcpy(ahashdreq->cache, creq->cache, creq->cache_ptr);
 
+	ahashdreq->cache_dma = dma_map_single(cesa_dev->dev, ahashdreq->cache, CESA_MAX_HASH_BLOCK_SIZE, DMA_TO_DEVICE);
+	if (dma_mapping_error(cesa_dev->dev, ahashdreq->cache_dma)) {
+		dev_err(cesa_dev->dev, "dma_map_single failed\n");
+		kfree(ahashdreq->cache);
+		return -ENOMEM;
+	}
+
 	return mv_cesa_dma_add_data_transfer(chain,
 					     CESA_SA_DATA_SRAM_OFFSET,
 					     ahashdreq->cache_dma,
diff --git a/drivers/crypto/marvell/cesa/cesa.c b/drivers/crypto/marvell/cesa/cesa.c
index 9c21f5d835d2..fd7f43575cb2 100644
--- a/drivers/crypto/marvell/cesa/cesa.c
+++ b/drivers/crypto/marvell/cesa/cesa.c
@@ -127,6 +127,8 @@ static irqreturn_t mv_cesa_int(int irq, void *priv)
 		if (!(status & mask))
 			break;
 
+		pr_err("mv_cesa_int: %d 0x%x 0x%x\n", engine->id, status, mask);
+
 		/*
 		 * TODO: avoid clearing the FPGA_INT_STATUS if this not
 		 * relevant on some platforms.
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..ff0735aaed7d 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -397,6 +397,8 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 	}
 
 	atomic_sub(ahashreq->nbytes, &engine->load);
+
+	pr_err("mv_cesa_ahash_complete: %d 0x%lx\n", engine->id, (unsigned long)ahashreq);
 }
 
 static void mv_cesa_ahash_prepare(struct crypto_async_request *req,
@@ -418,6 +420,8 @@ static void mv_cesa_ahash_req_cleanup(struct crypto_async_request *req)
 	struct ahash_request *ahashreq = ahash_request_cast(req);
 	struct mv_cesa_ahash_req *creq = ahash_request_ctx(ahashreq);
 
+	pr_err("mv_cesa_ahash_req_cleanup: %d 0x%lx\n", creq->base.engine->id, (unsigned long)ahashreq);
+
 	if (creq->last_req)
 		mv_cesa_ahash_last_cleanup(ahashreq);
 
@@ -796,6 +800,7 @@ static int mv_cesa_ahash_queue_req(struct ahash_request *req)
 	engine = mv_cesa_select_engine(req->nbytes);
 	mv_cesa_ahash_prepare(&req->base, engine);
 
+	pr_err("mv_cesa_ahash_queue_req: %d 0x%lx %d %d\n", engine->id, (unsigned long)req, req->nbytes, creq->last_req);
 	ret = mv_cesa_queue_req(&req->base, &creq->base);
 
 	if (mv_cesa_req_needs_cleanup(&req->base, ret))
diff --git a/drivers/crypto/marvell/cesa/tdma.c b/drivers/crypto/marvell/cesa/tdma.c
index 243305354420..55860b480dd6 100644
--- a/drivers/crypto/marvell/cesa/tdma.c
+++ b/drivers/crypto/marvell/cesa/tdma.c
@@ -47,6 +47,8 @@ void mv_cesa_dma_step(struct mv_cesa_req *dreq)
 	engine->chain_hw.last = dreq->chain.last;
 	spin_unlock_bh(&engine->lock);
 
+	pr_err("mv_cesa_dma_step: %d 0x%lx 0x%lx 0x%lx\n", engine->id, (unsigned long)dreq, (unsigned long)dreq->chain.first->cur_dma, (unsigned long)dreq->chain.last->cur_dma);
+
 	writel_relaxed(0, engine->regs + CESA_SA_CFG);
 
 	mv_cesa_set_int_mask(engine, CESA_SA_INT_ACC0_IDMA_DONE);
@@ -137,6 +139,7 @@ int mv_cesa_tdma_process(struct mv_cesa_engine *engine, u32 status)
 	int res = 0;
 
 	tdma_cur = readl(engine->regs + CESA_TDMA_CUR);
+	pr_err("mv_cesa_tdma_process: %d 0x%lx\n", engine->id, (unsigned long)tdma_cur);
 
 	for (tdma = engine->chain_hw.first; tdma; tdma = next) {
 		spin_lock_bh(&engine->lock);
@@ -186,6 +189,8 @@ int mv_cesa_tdma_process(struct mv_cesa_engine *engine, u32 status)
 			break;
 	}
 
+	pr_err("mv_cesa_tdma_process: %d %d 0x%lx\n", engine->id, res, (unsigned long)req);
+
 	/*
 	 * Save the last request in error to engine->req, so that the core
 	 * knows which request was faulty
diff --git a/crypto/asymmetric_keys/pkcs7_verify.c b/crypto/asymmetric_keys/pkcs7_verify.c
index f0d4ff3c20a8..53f71391a2c5 100644
--- a/crypto/asymmetric_keys/pkcs7_verify.c
+++ b/crypto/asymmetric_keys/pkcs7_verify.c
@@ -23,9 +23,7 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
 			struct pkcs7_signed_info *sinfo)
 {
 	struct public_key_signature *sig = sinfo->sig;
-	struct crypto_shash *tfm;
-	struct shash_desc *desc;
-	size_t desc_size;
+	struct crypto_ahash *tfm;
 	int ret;
 
 	kenter(",%u,%s", sinfo->index, sinfo->sig->hash_algo);
@@ -40,27 +38,23 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
 	/* Allocate the hashing algorithm we're going to need and find out how
 	 * big the hash operational data will be.
 	 */
-	tfm = crypto_alloc_shash(sinfo->sig->hash_algo, 0, 0);
+	tfm = crypto_alloc_ahash(sinfo->sig->hash_algo, 0, CRYPTO_ALG_ASYNC);
 	if (IS_ERR(tfm))
 		return (PTR_ERR(tfm) == -ENOENT) ? -ENOPKG : PTR_ERR(tfm);
 
-	desc_size = crypto_shash_descsize(tfm) + sizeof(*desc);
-	sig->digest_size = crypto_shash_digestsize(tfm);
+	sig->digest_size = crypto_ahash_digestsize(tfm);
 
 	ret = -ENOMEM;
 	sig->digest = kmalloc(sig->digest_size, GFP_KERNEL);
 	if (!sig->digest)
 		goto error_no_desc;
 
-	desc = kzalloc(desc_size, GFP_KERNEL);
-	if (!desc)
-		goto error_no_desc;
-
-	desc->tfm   = tfm;
+	HASH_REQUEST_ON_STACK(req, tfm);
+	ahash_request_set_flags(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
 
 	/* Digest the message [RFC2315 9.3] */
-	ret = crypto_shash_digest(desc, pkcs7->data, pkcs7->data_len,
-				  sig->digest);
+	ahash_request_set_virt(req, pkcs7->data, sig->digest, pkcs7->data_len);
+	ret = crypto_ahash_digest(req);
 	if (ret < 0)
 		goto error;
 	pr_devel("MsgDigest = [%*ph]\n", 8, sig->digest);
@@ -100,24 +94,26 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
 		 */
 		memset(sig->digest, 0, sig->digest_size);
 
-		ret = crypto_shash_init(desc);
+		ret = crypto_ahash_init(req);
 		if (ret < 0)
 			goto error;
 		tag = ASN1_CONS_BIT | ASN1_SET;
-		ret = crypto_shash_update(desc, &tag, 1);
+		ahash_request_set_virt(req, &tag, NULL, 1);
+		ret = crypto_ahash_update(req);
 		if (ret < 0)
 			goto error;
-		ret = crypto_shash_finup(desc, sinfo->authattrs,
-					 sinfo->authattrs_len, sig->digest);
+		ahash_request_set_virt(req, sinfo->authattrs, sig->digest,
+				       sinfo->authattrs_len);
+		ret = crypto_ahash_finup(req);
 		if (ret < 0)
 			goto error;
 		pr_devel("AADigest = [%*ph]\n", 8, sig->digest);
 	}
 
 error:
-	kfree(desc);
+	ahash_request_free(req);
 error_no_desc:
-	crypto_free_shash(tfm);
+	crypto_free_ahash(tfm);
 	kleave(" = %d", ret);
 	return ret;
 }
Re: crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Herbert Xu 6 months, 3 weeks ago
On Thu, May 22, 2025 at 11:01:36AM +0800, Herbert Xu wrote:
>
> I think we've made progress.  All the simple hashes have passed
> and now we're only failing on hmac.  So it looks it was the
> coherent memory being incoherent.

I've analysed the hmac errors and the good news is that they
all appear to be caused by a genuine bug in the driver.  When
the final update is zero-length which the hardware can't handle,
the driver fallback code is broken for hmac.

I'm working on a fix.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Herbert Xu 6 months, 3 weeks ago
On Thu, May 22, 2025 at 11:01:36AM +0800, Herbert Xu wrote:
> 
> So here is the latest debugging patch with dma_map_single on top
> of cryptodev.  Note that the partial hash mismatch code is buggy
> but it doesn't matter because it still prints enough info for us
> to interpret.

Oops, I screwed up that patch.  Here is the corrected version.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..5c46cd267789 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -49,8 +49,7 @@ mv_cesa_ahash_req_iter_next_op(struct mv_cesa_ahash_dma_iter *iter)
 static inline int
 mv_cesa_ahash_dma_alloc_cache(struct mv_cesa_ahash_dma_req *req, gfp_t flags)
 {
-	req->cache = dma_pool_alloc(cesa_dev->dma->cache_pool, flags,
-				    &req->cache_dma);
+	req->cache = kmalloc(CESA_MAX_HASH_BLOCK_SIZE, flags);
 	if (!req->cache)
 		return -ENOMEM;
 
@@ -63,8 +62,8 @@ mv_cesa_ahash_dma_free_cache(struct mv_cesa_ahash_dma_req *req)
 	if (!req->cache)
 		return;
 
-	dma_pool_free(cesa_dev->dma->cache_pool, req->cache,
-		      req->cache_dma);
+	dma_unmap_single(cesa_dev->dev, req->cache_dma, CESA_MAX_HASH_BLOCK_SIZE, DMA_TO_DEVICE);
+	kfree(req->cache);
 }
 
 static int mv_cesa_ahash_dma_alloc_padding(struct mv_cesa_ahash_dma_req *req,
@@ -533,6 +532,13 @@ mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
 
 	memcpy(ahashdreq->cache, creq->cache, creq->cache_ptr);
 
+	ahashdreq->cache_dma = dma_map_single(cesa_dev->dev, ahashdreq->cache, CESA_MAX_HASH_BLOCK_SIZE, DMA_TO_DEVICE);
+	if (dma_mapping_error(cesa_dev->dev, ahashdreq->cache_dma)) {
+		dev_err(cesa_dev->dev, "dma_map_single failed\n");
+		kfree(ahashdreq->cache);
+		return -ENOMEM;
+	}
+
 	return mv_cesa_dma_add_data_transfer(chain,
 					     CESA_SA_DATA_SRAM_OFFSET,
 					     ahashdreq->cache_dma,
diff --git a/drivers/crypto/marvell/cesa/cesa.c b/drivers/crypto/marvell/cesa/cesa.c
index 9c21f5d835d2..fd7f43575cb2 100644
--- a/drivers/crypto/marvell/cesa/cesa.c
+++ b/drivers/crypto/marvell/cesa/cesa.c
@@ -127,6 +127,8 @@ static irqreturn_t mv_cesa_int(int irq, void *priv)
 		if (!(status & mask))
 			break;
 
+		pr_err("mv_cesa_int: %d 0x%x 0x%x\n", engine->id, status, mask);
+
 		/*
 		 * TODO: avoid clearing the FPGA_INT_STATUS if this not
 		 * relevant on some platforms.
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..ff0735aaed7d 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -397,6 +397,8 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 	}
 
 	atomic_sub(ahashreq->nbytes, &engine->load);
+
+	pr_err("mv_cesa_ahash_complete: %d 0x%lx\n", engine->id, (unsigned long)ahashreq);
 }
 
 static void mv_cesa_ahash_prepare(struct crypto_async_request *req,
@@ -418,6 +420,8 @@ static void mv_cesa_ahash_req_cleanup(struct crypto_async_request *req)
 	struct ahash_request *ahashreq = ahash_request_cast(req);
 	struct mv_cesa_ahash_req *creq = ahash_request_ctx(ahashreq);
 
+	pr_err("mv_cesa_ahash_req_cleanup: %d 0x%lx\n", creq->base.engine->id, (unsigned long)ahashreq);
+
 	if (creq->last_req)
 		mv_cesa_ahash_last_cleanup(ahashreq);
 
@@ -796,6 +800,7 @@ static int mv_cesa_ahash_queue_req(struct ahash_request *req)
 	engine = mv_cesa_select_engine(req->nbytes);
 	mv_cesa_ahash_prepare(&req->base, engine);
 
+	pr_err("mv_cesa_ahash_queue_req: %d 0x%lx %d %d\n", engine->id, (unsigned long)req, req->nbytes, creq->last_req);
 	ret = mv_cesa_queue_req(&req->base, &creq->base);
 
 	if (mv_cesa_req_needs_cleanup(&req->base, ret))
diff --git a/drivers/crypto/marvell/cesa/tdma.c b/drivers/crypto/marvell/cesa/tdma.c
index 243305354420..55860b480dd6 100644
--- a/drivers/crypto/marvell/cesa/tdma.c
+++ b/drivers/crypto/marvell/cesa/tdma.c
@@ -47,6 +47,8 @@ void mv_cesa_dma_step(struct mv_cesa_req *dreq)
 	engine->chain_hw.last = dreq->chain.last;
 	spin_unlock_bh(&engine->lock);
 
+	pr_err("mv_cesa_dma_step: %d 0x%lx 0x%lx 0x%lx\n", engine->id, (unsigned long)dreq, (unsigned long)dreq->chain.first->cur_dma, (unsigned long)dreq->chain.last->cur_dma);
+
 	writel_relaxed(0, engine->regs + CESA_SA_CFG);
 
 	mv_cesa_set_int_mask(engine, CESA_SA_INT_ACC0_IDMA_DONE);
@@ -137,6 +139,7 @@ int mv_cesa_tdma_process(struct mv_cesa_engine *engine, u32 status)
 	int res = 0;
 
 	tdma_cur = readl(engine->regs + CESA_TDMA_CUR);
+	pr_err("mv_cesa_tdma_process: %d 0x%lx\n", engine->id, (unsigned long)tdma_cur);
 
 	for (tdma = engine->chain_hw.first; tdma; tdma = next) {
 		spin_lock_bh(&engine->lock);
@@ -186,6 +189,8 @@ int mv_cesa_tdma_process(struct mv_cesa_engine *engine, u32 status)
 			break;
 	}
 
+	pr_err("mv_cesa_tdma_process: %d %d 0x%lx\n", engine->id, res, (unsigned long)req);
+
 	/*
 	 * Save the last request in error to engine->req, so that the core
 	 * knows which request was faulty
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..230501fe843b 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -374,6 +374,12 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 
 		memcpy(ahashreq->result, data, digsize);
 	} else {
+		struct {
+			u32 digest[8];
+			u64 len;
+		} state;
+
+		memcpy(state.digest, creq->state, digsize);
 		for (i = 0; i < digsize / 4; i++)
 			creq->state[i] = readl_relaxed(engine->regs +
 						       CESA_IVDIG(i));
@@ -393,6 +399,21 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 				for (i = 0; i < digsize / 4; i++)
 					result[i] = cpu_to_be32(creq->state[i]);
 			}
+		} else {
+			HASH_FBREQ_ON_STACK(fbreq, ahashreq);
+
+			crypto_ahash_import_core(fbreq, &state);
+			crypto_ahash_update(fbreq);
+			crypto_ahash_export_core(fbreq, &state);
+			if (memcmp(state.digest, creq->state, digsize)) {
+				pr_err("mv_cesa_ahash_complete partial hash mismatch\n");
+				print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET,
+						16, 1,
+						state.digest, digsize, false);
+				print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET,
+						16, 1,
+						creq->state, digsize, false);
+			}
 		}
 	}
Re: crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Corentin Labbe 6 months, 3 weeks ago
Le Thu, May 22, 2025 at 03:38:20PM +0800, Herbert Xu a écrit :
> On Thu, May 22, 2025 at 11:01:36AM +0800, Herbert Xu wrote:
> > 
> > So here is the latest debugging patch with dma_map_single on top
> > of cryptodev.  Note that the partial hash mismatch code is buggy
> > but it doesn't matter because it still prints enough info for us
> > to interpret.
> 
> Oops, I screwed up that patch.  Here is the corrected version.
> 

Hello

Here is the result:
http://kernel.montjoie.ovh/479404.log

I have built by adding also your "crypto: marvell/cesa - Fix engine load inaccuracy"

Thanks
Regards
Re: crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Herbert Xu 6 months, 2 weeks ago
On Thu, May 22, 2025 at 10:07:54PM +0200, Corentin Labbe wrote:
>
> Here is the result:
> http://kernel.montjoie.ovh/479404.log
> 
> I have built by adding also your "crypto: marvell/cesa - Fix engine load inaccuracy"

Please try this patch on top of the current mainline tree.

I've force-enabled the software finalisation code and switched it
over to kmalloc + dma_map_single.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..e5b1d6a9add5 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -49,8 +49,7 @@ mv_cesa_ahash_req_iter_next_op(struct mv_cesa_ahash_dma_iter *iter)
 static inline int
 mv_cesa_ahash_dma_alloc_cache(struct mv_cesa_ahash_dma_req *req, gfp_t flags)
 {
-	req->cache = dma_pool_alloc(cesa_dev->dma->cache_pool, flags,
-				    &req->cache_dma);
+	req->cache = kmalloc(CESA_MAX_HASH_BLOCK_SIZE, flags);
 	if (!req->cache)
 		return -ENOMEM;
 
@@ -63,18 +62,14 @@ mv_cesa_ahash_dma_free_cache(struct mv_cesa_ahash_dma_req *req)
 	if (!req->cache)
 		return;
 
-	dma_pool_free(cesa_dev->dma->cache_pool, req->cache,
-		      req->cache_dma);
+	dma_unmap_single(cesa_dev->dev, req->cache_dma, CESA_MAX_HASH_BLOCK_SIZE, DMA_TO_DEVICE);
+	kfree(req->cache);
 }
 
 static int mv_cesa_ahash_dma_alloc_padding(struct mv_cesa_ahash_dma_req *req,
 					   gfp_t flags)
 {
-	if (req->padding)
-		return 0;
-
-	req->padding = dma_pool_alloc(cesa_dev->dma->padding_pool, flags,
-				      &req->padding_dma);
+	req->padding = kmalloc(72, flags);
 	if (!req->padding)
 		return -ENOMEM;
 
@@ -86,9 +81,8 @@ static void mv_cesa_ahash_dma_free_padding(struct mv_cesa_ahash_dma_req *req)
 	if (!req->padding)
 		return;
 
-	dma_pool_free(cesa_dev->dma->padding_pool, req->padding,
-		      req->padding_dma);
-	req->padding = NULL;
+	dma_unmap_single(cesa_dev->dev, req->padding_dma, 72, DMA_TO_DEVICE);
+	kfree(req->padding);
 }
 
 static inline void mv_cesa_ahash_dma_last_cleanup(struct ahash_request *req)
@@ -533,6 +527,13 @@ mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
 
 	memcpy(ahashdreq->cache, creq->cache, creq->cache_ptr);
 
+	ahashdreq->cache_dma = dma_map_single(cesa_dev->dev, ahashdreq->cache, CESA_MAX_HASH_BLOCK_SIZE, DMA_TO_DEVICE);
+	if (dma_mapping_error(cesa_dev->dev, ahashdreq->cache_dma)) {
+		dev_err(cesa_dev->dev, "dma_map_single failed\n");
+		kfree(ahashdreq->cache);
+		return -ENOMEM;
+	}
+
 	return mv_cesa_dma_add_data_transfer(chain,
 					     CESA_SA_DATA_SRAM_OFFSET,
 					     ahashdreq->cache_dma,
@@ -556,7 +557,7 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
 	 * If the transfer is smaller than our maximum length, and we have
 	 * some data outstanding, we can ask the engine to finish the hash.
 	 */
-	if (creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX && frag_len) {
+	if (0 && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX && frag_len) {
 		op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, frag_len,
 					  flags);
 		if (IS_ERR(op))
@@ -588,6 +589,13 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
 
 	trailerlen = mv_cesa_ahash_pad_req(creq, ahashdreq->padding);
 
+	ahashdreq->padding_dma = dma_map_single(cesa_dev->dev, ahashdreq->padding, 72, DMA_TO_DEVICE);
+	if (dma_mapping_error(cesa_dev->dev, ahashdreq->padding_dma)) {
+		dev_err(cesa_dev->dev, "dma_map_single failed\n");
+		kfree(ahashdreq->padding);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	len = min(CESA_SA_SRAM_PAYLOAD_SIZE - frag_len, trailerlen);
 	if (len) {
 		ret = mv_cesa_dma_add_data_transfer(chain,
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..230501fe843b 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -374,6 +374,12 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 
 		memcpy(ahashreq->result, data, digsize);
 	} else {
+		struct {
+			u32 digest[8];
+			u64 len;
+		} state;
+
+		memcpy(state.digest, creq->state, digsize);
 		for (i = 0; i < digsize / 4; i++)
 			creq->state[i] = readl_relaxed(engine->regs +
 						       CESA_IVDIG(i));
@@ -393,6 +399,21 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 				for (i = 0; i < digsize / 4; i++)
 					result[i] = cpu_to_be32(creq->state[i]);
 			}
+		} else {
+			HASH_FBREQ_ON_STACK(fbreq, ahashreq);
+
+			crypto_ahash_import_core(fbreq, &state);
+			crypto_ahash_update(fbreq);
+			crypto_ahash_export_core(fbreq, &state);
+			if (memcmp(state.digest, creq->state, digsize)) {
+				pr_err("mv_cesa_ahash_complete partial hash mismatch\n");
+				print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET,
+						16, 1,
+						state.digest, digsize, false);
+				print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET,
+						16, 1,
+						creq->state, digsize, false);
+			}
 		}
 	}
 
diff --git a/drivers/crypto/marvell/cesa/cesa.c b/drivers/crypto/marvell/cesa/cesa.c
index 9c21f5d835d2..fd7f43575cb2 100644
--- a/drivers/crypto/marvell/cesa/cesa.c
+++ b/drivers/crypto/marvell/cesa/cesa.c
@@ -127,6 +127,8 @@ static irqreturn_t mv_cesa_int(int irq, void *priv)
 		if (!(status & mask))
 			break;
 
+		pr_err("mv_cesa_int: %d 0x%x 0x%x\n", engine->id, status, mask);
+
 		/*
 		 * TODO: avoid clearing the FPGA_INT_STATUS if this not
 		 * relevant on some platforms.
diff --git a/drivers/crypto/marvell/cesa/hash.c b/drivers/crypto/marvell/cesa/hash.c
index 6815eddc9068..ff0735aaed7d 100644
--- a/drivers/crypto/marvell/cesa/hash.c
+++ b/drivers/crypto/marvell/cesa/hash.c
@@ -397,6 +397,8 @@ static void mv_cesa_ahash_complete(struct crypto_async_request *req)
 	}
 
 	atomic_sub(ahashreq->nbytes, &engine->load);
+
+	pr_err("mv_cesa_ahash_complete: %d 0x%lx\n", engine->id, (unsigned long)ahashreq);
 }
 
 static void mv_cesa_ahash_prepare(struct crypto_async_request *req,
@@ -418,6 +420,8 @@ static void mv_cesa_ahash_req_cleanup(struct crypto_async_request *req)
 	struct ahash_request *ahashreq = ahash_request_cast(req);
 	struct mv_cesa_ahash_req *creq = ahash_request_ctx(ahashreq);
 
+	pr_err("mv_cesa_ahash_req_cleanup: %d 0x%lx\n", creq->base.engine->id, (unsigned long)ahashreq);
+
 	if (creq->last_req)
 		mv_cesa_ahash_last_cleanup(ahashreq);
 
@@ -796,6 +800,7 @@ static int mv_cesa_ahash_queue_req(struct ahash_request *req)
 	engine = mv_cesa_select_engine(req->nbytes);
 	mv_cesa_ahash_prepare(&req->base, engine);
 
+	pr_err("mv_cesa_ahash_queue_req: %d 0x%lx %d %d\n", engine->id, (unsigned long)req, req->nbytes, creq->last_req);
 	ret = mv_cesa_queue_req(&req->base, &creq->base);
 
 	if (mv_cesa_req_needs_cleanup(&req->base, ret))
diff --git a/drivers/crypto/marvell/cesa/tdma.c b/drivers/crypto/marvell/cesa/tdma.c
index 243305354420..55860b480dd6 100644
--- a/drivers/crypto/marvell/cesa/tdma.c
+++ b/drivers/crypto/marvell/cesa/tdma.c
@@ -47,6 +47,8 @@ void mv_cesa_dma_step(struct mv_cesa_req *dreq)
 	engine->chain_hw.last = dreq->chain.last;
 	spin_unlock_bh(&engine->lock);
 
+	pr_err("mv_cesa_dma_step: %d 0x%lx 0x%lx 0x%lx\n", engine->id, (unsigned long)dreq, (unsigned long)dreq->chain.first->cur_dma, (unsigned long)dreq->chain.last->cur_dma);
+
 	writel_relaxed(0, engine->regs + CESA_SA_CFG);
 
 	mv_cesa_set_int_mask(engine, CESA_SA_INT_ACC0_IDMA_DONE);
@@ -137,6 +139,7 @@ int mv_cesa_tdma_process(struct mv_cesa_engine *engine, u32 status)
 	int res = 0;
 
 	tdma_cur = readl(engine->regs + CESA_TDMA_CUR);
+	pr_err("mv_cesa_tdma_process: %d 0x%lx\n", engine->id, (unsigned long)tdma_cur);
 
 	for (tdma = engine->chain_hw.first; tdma; tdma = next) {
 		spin_lock_bh(&engine->lock);
@@ -186,6 +189,8 @@ int mv_cesa_tdma_process(struct mv_cesa_engine *engine, u32 status)
 			break;
 	}
 
+	pr_err("mv_cesa_tdma_process: %d %d 0x%lx\n", engine->id, res, (unsigned long)req);
+
 	/*
 	 * Save the last request in error to engine->req, so that the core
 	 * knows which request was faulty
Re: crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Corentin Labbe 6 months, 2 weeks ago
Le Wed, May 28, 2025 at 05:58:13PM +0800, Herbert Xu a écrit :
> On Thu, May 22, 2025 at 10:07:54PM +0200, Corentin Labbe wrote:
> >
> > Here is the result:
> > http://kernel.montjoie.ovh/479404.log
> > 
> > I have built by adding also your "crypto: marvell/cesa - Fix engine load inaccuracy"
> 
> Please try this patch on top of the current mainline tree.
> 
> I've force-enabled the software finalisation code and switched it
> over to kmalloc + dma_map_single.
> 
> Thanks,
> -- 

Hello

I have tried on top of torvalds/master 90b83efa6701656e02c86e7df2cb1765ea602d07

https://kernel.montjoie.ovh/479785.log

Regards
Re: crypto: marvell/cesa - dma_alloc_coherent broken but kmalloc + dma_map_single works
Posted by Herbert Xu 6 months, 3 weeks ago
On Thu, May 22, 2025 at 10:07:54PM +0200, Corentin Labbe wrote:
>
> Here is the result:
> http://kernel.montjoie.ovh/479404.log

So the corruptions have come back, although somewhat less than
before as one algorithm managed to pass all the tests.

At least we've proven that adding the printk's makes the errors
a lot more frequent :)

> I have built by adding also your "crypto: marvell/cesa - Fix engine load inaccuracy"

That shouldn't affect things either way.

So now all the corruptions are all on the final hash.

Can you boot this a few times to see if the errors are always
the same? I'm particularly interested to see if we've managed
to crush the errors on the partial hashes or was it just pure
luck.

Intriguingly the one error that went away (sha256) was actually
a partial hash corruption, and sha256 now passes all the tests.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt