[v3] Add support for read retry

[PATCH v3 1/2] mtd: spi-nand: Add read retry support

Posted by Cheng Ming Lin 1 year ago

From: Cheng Ming Lin <chengminglin@mxic.com.tw>

When the host ECC fails to correct the data error of NAND device,
there's a special read for data recovery method which can be setup
by the host for the next read. There are several retry levels that
can be attempted until the lost data is recovered or definitely
assumed lost.

Signed-off-by: Cheng Ming Lin <chengminglin@mxic.com.tw>
---
 drivers/mtd/nand/spi/core.c | 35 +++++++++++++++++++++++++++++++++--
 include/linux/mtd/spinand.h | 14 ++++++++++++++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 4d76f9f71a0e..3f72d94c09f3 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -671,11 +671,15 @@ static int spinand_mtd_regular_page_read(struct mtd_info *mtd, loff_t from,
 {
 	struct spinand_device *spinand = mtd_to_spinand(mtd);
 	struct nand_device *nand = mtd_to_nanddev(mtd);
+	struct mtd_ecc_stats old_stats;
 	struct nand_io_iter iter;
 	bool disable_ecc = false;
 	bool ecc_failed = false;
+	unsigned int retry_mode = 0;
 	int ret;
 
+	old_stats = mtd->ecc_stats;
+
 	if (ops->mode == MTD_OPS_RAW || !mtd->ooblayout)
 		disable_ecc = true;
 
@@ -687,18 +691,43 @@ static int spinand_mtd_regular_page_read(struct mtd_info *mtd, loff_t from,
 		if (ret)
 			break;
 
+read_retry:
 		ret = spinand_read_page(spinand, &iter.req);
 		if (ret < 0 && ret != -EBADMSG)
 			break;
 
-		if (ret == -EBADMSG)
+		if (ret == -EBADMSG && spinand->set_read_retry) {
+			if (spinand->read_retries && (++retry_mode < spinand->read_retries)) {
+				ret = spinand->set_read_retry(spinand, retry_mode);
+				if (ret < 0) {
+					ecc_failed = true;
+					return ret;
+				}
+
+				/* Reset ecc_stats; retry */
+				mtd->ecc_stats = old_stats;
+				goto read_retry;
+			} else {
+				/* No more retry modes; real failure */
+				ecc_failed = true;
+			}
+		} else if (ret == -EBADMSG) {
 			ecc_failed = true;
-		else
+		} else {
 			*max_bitflips = max_t(unsigned int, *max_bitflips, ret);
+		}
 
 		ret = 0;
 		ops->retlen += iter.req.datalen;
 		ops->oobretlen += iter.req.ooblen;
+
+		/* Reset to retry mode 0 */
+		if (retry_mode) {
+			retry_mode = 0;
+			ret = spinand->set_read_retry(spinand, retry_mode);
+			if (ret < 0)
+				return ret;
+		}
 	}
 
 	if (ecc_failed && !ret)
@@ -1268,6 +1297,8 @@ int spinand_match_and_init(struct spinand_device *spinand,
 		spinand->id.len = 1 + table[i].devid.len;
 		spinand->select_target = table[i].select_target;
 		spinand->set_cont_read = table[i].set_cont_read;
+		spinand->read_retries = table[i].read_retries;
+		spinand->set_read_retry = table[i].set_read_retry;
 
 		op = spinand_select_op_variant(spinand,
 					       info->op_variants.read_cache);
diff --git a/include/linux/mtd/spinand.h b/include/linux/mtd/spinand.h
index 702e5fb13dae..bbfef90135f5 100644
--- a/include/linux/mtd/spinand.h
+++ b/include/linux/mtd/spinand.h
@@ -339,6 +339,8 @@ struct spinand_ondie_ecc_conf {
  * @select_target: function used to select a target/die. Required only for
  *		   multi-die chips
  * @set_cont_read: enable/disable continuous cached reads
+ * @read_retries: the number of read retry modes supported
+ * @set_read_retry: enable/disable read retry for data recovery
  *
  * Each SPI NAND manufacturer driver should have a spinand_info table
  * describing all the chips supported by the driver.
@@ -359,6 +361,9 @@ struct spinand_info {
 			     unsigned int target);
 	int (*set_cont_read)(struct spinand_device *spinand,
 			     bool enable);
+	unsigned int read_retries;
+	int (*set_read_retry)(struct spinand_device *spinand,
+			     unsigned int read_retry);
 };
 
 #define SPINAND_ID(__method, ...)					\
@@ -387,6 +392,10 @@ struct spinand_info {
 #define SPINAND_CONT_READ(__set_cont_read)				\
 	.set_cont_read = __set_cont_read,
 
+#define SPINAND_READ_RETRY(__read_retries, __set_read_retry)		\
+	.read_retries = __read_retries,				\
+	.set_read_retry = __set_read_retry,
+
 #define SPINAND_INFO(__model, __id, __memorg, __eccreq, __op_variants,	\
 		     __flags, ...)					\
 	{								\
@@ -436,6 +445,8 @@ struct spinand_dirmap {
  *		A per-transfer check must of course be done to ensure it is
  *		actually relevant to enable this feature.
  * @set_cont_read: Enable/disable the continuous read feature
+ * @read_retries: the number of read retry modes supported
+ * @set_read_retry: Enable/disable the read retry feature
  * @priv: manufacturer private data
  */
 struct spinand_device {
@@ -469,6 +480,9 @@ struct spinand_device {
 	bool cont_read_possible;
 	int (*set_cont_read)(struct spinand_device *spinand,
 			     bool enable);
+	unsigned int read_retries;
+	int (*set_read_retry)(struct spinand_device *spinand,
+			     unsigned int retry_mode);
 };
 
 /**
-- 
2.25.1

Re: [PATCH v3 1/2] mtd: spi-nand: Add read retry support

Posted by Miquel Raynal 1 year ago

Hello Cheng,

> @@ -687,18 +691,43 @@ static int spinand_mtd_regular_page_read(struct mtd_info *mtd, loff_t from,
>  		if (ret)
>  			break;
>  
> +read_retry:
>  		ret = spinand_read_page(spinand, &iter.req);
>  		if (ret < 0 && ret != -EBADMSG)
>  			break;
>  
> -		if (ret == -EBADMSG)
> +		if (ret == -EBADMSG && spinand->set_read_retry) {
> +			if (spinand->read_retries && (++retry_mode < spinand->read_retries)) {

I believe the condition should be:

                        if (spinand->read_retries && (++retry_mode <= spinand->read_retries)) {

So if you have 5 retry modes, you can provide 5 in the manufacturer driver,
and not 6.

> +				ret = spinand->set_read_retry(spinand, retry_mode);
> +				if (ret < 0) {
> +					ecc_failed = true;
> +					return ret;

Shall we try to set the read_retry level to 0 upon:

      if (ret < 0 && retry_mode > 1)

?

> +				}
> +
> +				/* Reset ecc_stats; retry */
> +				mtd->ecc_stats = old_stats;
> +				goto read_retry;
> +			} else {
> +				/* No more retry modes; real failure */
> +				ecc_failed = true;
> +			}
> +		} else if (ret == -EBADMSG) {

Rest lgtm.

Thanks,
Miquèl

Re: [PATCH v3 1/2] mtd: spi-nand: Add read retry support

Posted by Cheng Ming Lin 12 months ago

Hi Miquel,

Miquel Raynal <miquel.raynal@bootlin.com> 於 2025年2月7日 週五 上午1:13寫道：
>
> Hello Cheng,
>
> > @@ -687,18 +691,43 @@ static int spinand_mtd_regular_page_read(struct mtd_info *mtd, loff_t from,
> >               if (ret)
> >                       break;
> >
> > +read_retry:
> >               ret = spinand_read_page(spinand, &iter.req);
> >               if (ret < 0 && ret != -EBADMSG)
> >                       break;
> >
> > -             if (ret == -EBADMSG)
> > +             if (ret == -EBADMSG && spinand->set_read_retry) {
> > +                     if (spinand->read_retries && (++retry_mode < spinand->read_retries)) {
>
> I believe the condition should be:
>
>                         if (spinand->read_retries && (++retry_mode <= spinand->read_retries)) {
>
> So if you have 5 retry modes, you can provide 5 in the manufacturer driver,
> and not 6.

This was originally based on the configuration in rawnand.
However, I agree that your proposed condition is a better approach.
Thank you for the suggestion.

>
> > +                             ret = spinand->set_read_retry(spinand, retry_mode);
> > +                             if (ret < 0) {
> > +                                     ecc_failed = true;
> > +                                     return ret;
>
> Shall we try to set the read_retry level to 0 upon:
>
>       if (ret < 0 && retry_mode > 1)
>
> ?

If we set the read_retry level to 0 upon, and set_read_retry fails
when retry_mode equals to 1, it won't return an error. This could
potentially mask an underlying issue.

>
> > +                             }
> > +
> > +                             /* Reset ecc_stats; retry */
> > +                             mtd->ecc_stats = old_stats;
> > +                             goto read_retry;
> > +                     } else {
> > +                             /* No more retry modes; real failure */
> > +                             ecc_failed = true;
> > +                     }
> > +             } else if (ret == -EBADMSG) {
>
> Rest lgtm.
>
> Thanks,
> Miquèl

Thanks,
Cheng Ming Lin

Re: [PATCH v3 1/2] mtd: spi-nand: Add read retry support

Posted by Miquel Raynal 12 months ago

Hello,

>> > +                             ret = spinand->set_read_retry(spinand, retry_mode);
>> > +                             if (ret < 0) {
>> > +                                     ecc_failed = true;
>> > +                                     return ret;
>>
>> Shall we try to set the read_retry level to 0 upon:
>>
>>       if (ret < 0 && retry_mode > 1)
>>
>> ?
>
> If we set the read_retry level to 0 upon, and set_read_retry fails
> when retry_mode equals to 1, it won't return an error. This could
> potentially mask an underlying issue.

Don't save the return value in this case? But otherwise you would leave
the chip in a retry state, no?

Thanks,
Miquèl

Re: [PATCH v3 1/2] mtd: spi-nand: Add read retry support

Posted by Cheng Ming Lin 12 months ago

Hi Miquel,

Miquel Raynal <miquel.raynal@bootlin.com> 於 2025年2月10日 週一 下午6:07寫道：
>
> Hello,
>
> >> > +                             ret = spinand->set_read_retry(spinand, retry_mode);
> >> > +                             if (ret < 0) {
> >> > +                                     ecc_failed = true;
> >> > +                                     return ret;
> >>
> >> Shall we try to set the read_retry level to 0 upon:
> >>
> >>       if (ret < 0 && retry_mode > 1)
> >>
> >> ?
> >
> > If we set the read_retry level to 0 upon, and set_read_retry fails
> > when retry_mode equals to 1, it won't return an error. This could
> > potentially mask an underlying issue.
>
> Don't save the return value in this case? But otherwise you would leave
> the chip in a retry state, no?

However, if we set the read_retry level to 0 upon, the chip would still
remain in the retry state if set_read_retry fails.

I come up with a solution: setting the read_retry level to 0 right before
the read_retry label. This ensures that subsequent reads always start
from level 0, and it eliminates the need to reset the read_retry level at
the end.

>
> Thanks,
> Miquèl

Thanks,
Cheng Ming Lin

Re: [PATCH v3 1/2] mtd: spi-nand: Add read retry support

Posted by Miquel Raynal 12 months ago

On 11/02/2025 at 16:13:32 +08, Cheng Ming Lin <linchengming884@gmail.com> wrote:

> Hi Miquel,
>
> Miquel Raynal <miquel.raynal@bootlin.com> 於 2025年2月10日 週一 下午6:07寫道：
>>
>> Hello,
>>
>> >> > +                             ret = spinand->set_read_retry(spinand, retry_mode);
>> >> > +                             if (ret < 0) {
>> >> > +                                     ecc_failed = true;
>> >> > +                                     return ret;
>> >>
>> >> Shall we try to set the read_retry level to 0 upon:
>> >>
>> >>       if (ret < 0 && retry_mode > 1)
>> >>
>> >> ?
>> >
>> > If we set the read_retry level to 0 upon, and set_read_retry fails
>> > when retry_mode equals to 1, it won't return an error. This could
>> > potentially mask an underlying issue.
>>
>> Don't save the return value in this case? But otherwise you would leave
>> the chip in a retry state, no?
>
> However, if we set the read_retry level to 0 upon, the chip would still
> remain in the retry state if set_read_retry fails.

Well, if even this fails, you have bigger troubles than read_retry being
enabled.

>
> I come up with a solution: setting the read_retry level to 0 right before
> the read_retry label. This ensures that subsequent reads always start
> from level 0, and it eliminates the need to reset the read_retry level at
> the end.

I don't see the difference with the previous solution. If it fails, it's
exactly the same.

Please do not add a function call to *every* read regardless of the fact
that we enabled read_retry (which I think is what you are suggesting).

Thanks,
Miquèl