[PATCH 05/12] badblocks: return error if any badblock set fails

Zheng Qixing posted 12 patches 9 months, 4 weeks ago
There is a newer version of this series
[PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Zheng Qixing 9 months, 4 weeks ago
From: Li Nan <linan122@huawei.com>

_badblocks_set() returns success if at least one badblock is set
successfully, even if others fail. This can lead to data inconsistencies
in raid, where a failed badblock set should trigger the disk to be kicked
out to prevent future reads from failed write areas.

_badblocks_set() should return error if any badblock set fails. Instead
of relying on 'rv', directly returning 'sectors' for clearer logic. If all
badblocks are successfully set, 'sectors' will be 0, otherwise it
indicates the number of badblocks that have not been set yet, thus
signaling failure.

By the way, it can also fix an issue: when a newly set unack badblock is
included in an existing ack badblock, the setting will return an error.
···
  echo "0 100" /sys/block/md0/md/dev-loop1/bad_blocks
  echo "0 100" /sys/block/md0/md/dev-loop1/unacknowledged_bad_blocks
  -bash: echo: write error: No space left on device
```
After fix, it will return success.

Fixes: aa511ff8218b ("badblocks: switch to the improved badblock handling code")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 block/badblocks.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/block/badblocks.c b/block/badblocks.c
index 1c8b8f65f6df..a953d2e9417f 100644
--- a/block/badblocks.c
+++ b/block/badblocks.c
@@ -843,7 +843,6 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
 	struct badblocks_context bad;
 	int prev = -1, hint = -1;
 	unsigned long flags;
-	int rv = 0;
 	u64 *p;
 
 	if (bb->shift < 0)
@@ -873,10 +872,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
 	bad.len = sectors;
 	len = 0;
 
-	if (badblocks_full(bb)) {
-		rv = 1;
+	if (badblocks_full(bb))
 		goto out;
-	}
 
 	if (badblocks_empty(bb)) {
 		len = insert_at(bb, 0, &bad);
@@ -916,10 +913,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
 			int extra = 0;
 
 			if (!can_front_overwrite(bb, prev, &bad, &extra)) {
-				if (extra > 0) {
-					rv = 1;
+				if (extra > 0)
 					goto out;
-				}
 
 				len = min_t(sector_t,
 					    BB_END(p[prev]) - s, sectors);
@@ -986,10 +981,7 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
 
 	write_sequnlock_irqrestore(&bb->lock, flags);
 
-	if (!added)
-		rv = 1;
-
-	return rv;
+	return sectors;
 }
 
 /*
@@ -1353,7 +1345,7 @@ EXPORT_SYMBOL_GPL(badblocks_check);
  *
  * Return:
  *  0: success
- *  1: failed to set badblocks (out of space)
+ *  other: failed to set badblocks (out of space)
  */
 int badblocks_set(struct badblocks *bb, sector_t s, int sectors,
 			int acknowledged)
-- 
2.39.2

Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Coly Li 9 months, 3 weeks ago
On Fri, Feb 21, 2025 at 04:11:02PM +0800, Zheng Qixing wrote:
> From: Li Nan <linan122@huawei.com>
> 
> _badblocks_set() returns success if at least one badblock is set
> successfully, even if others fail. This can lead to data inconsistencies
> in raid, where a failed badblock set should trigger the disk to be kicked
> out to prevent future reads from failed write areas.
> 
> _badblocks_set() should return error if any badblock set fails. Instead
> of relying on 'rv', directly returning 'sectors' for clearer logic. If all
> badblocks are successfully set, 'sectors' will be 0, otherwise it
> indicates the number of badblocks that have not been set yet, thus
> signaling failure.
> 
> By the way, it can also fix an issue: when a newly set unack badblock is
> included in an existing ack badblock, the setting will return an error.
> ···
>   echo "0 100" /sys/block/md0/md/dev-loop1/bad_blocks
>   echo "0 100" /sys/block/md0/md/dev-loop1/unacknowledged_bad_blocks
>   -bash: echo: write error: No space left on device
> ```
> After fix, it will return success.
> 
> Fixes: aa511ff8218b ("badblocks: switch to the improved badblock handling code")
> Signed-off-by: Li Nan <linan122@huawei.com>

Based on Kuai's reply that we dont handle partial set and treat it as failure,
I am fine with this patch.

It will be good to add comment to explain that the parital set condition will be
treated as failure.

Acked-by: Coly Li <colyli@kernel.org>

Thanks.


> ---
>  block/badblocks.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/block/badblocks.c b/block/badblocks.c
> index 1c8b8f65f6df..a953d2e9417f 100644
> --- a/block/badblocks.c
> +++ b/block/badblocks.c
> @@ -843,7 +843,6 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  	struct badblocks_context bad;
>  	int prev = -1, hint = -1;
>  	unsigned long flags;
> -	int rv = 0;
>  	u64 *p;
>  
>  	if (bb->shift < 0)
> @@ -873,10 +872,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  	bad.len = sectors;
>  	len = 0;
>  
> -	if (badblocks_full(bb)) {
> -		rv = 1;
> +	if (badblocks_full(bb))
>  		goto out;
> -	}
>  
>  	if (badblocks_empty(bb)) {
>  		len = insert_at(bb, 0, &bad);
> @@ -916,10 +913,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  			int extra = 0;
>  
>  			if (!can_front_overwrite(bb, prev, &bad, &extra)) {
> -				if (extra > 0) {
> -					rv = 1;
> +				if (extra > 0)
>  					goto out;
> -				}
>  
>  				len = min_t(sector_t,
>  					    BB_END(p[prev]) - s, sectors);
> @@ -986,10 +981,7 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  
>  	write_sequnlock_irqrestore(&bb->lock, flags);
>  
> -	if (!added)
> -		rv = 1;
> -
> -	return rv;
> +	return sectors;
>  }
>  
>  /*
> @@ -1353,7 +1345,7 @@ EXPORT_SYMBOL_GPL(badblocks_check);
>   *
>   * Return:
>   *  0: success
> - *  1: failed to set badblocks (out of space)
> + *  other: failed to set badblocks (out of space)
>   */
>  int badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  			int acknowledged)
> -- 
> 2.39.2
> 

-- 
Coly Li
Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Coly Li 9 months, 4 weeks ago
On Fri, Feb 21, 2025 at 04:11:02PM +0800, Zheng Qixing wrote:
> From: Li Nan <linan122@huawei.com>
> 
> _badblocks_set() returns success if at least one badblock is set
> successfully, even if others fail. This can lead to data inconsistencies
> in raid, where a failed badblock set should trigger the disk to be kicked
> out to prevent future reads from failed write areas.
> 
> _badblocks_set() should return error if any badblock set fails. Instead
> of relying on 'rv', directly returning 'sectors' for clearer logic. If all
> badblocks are successfully set, 'sectors' will be 0, otherwise it
> indicates the number of badblocks that have not been set yet, thus
> signaling failure.
> 
> By the way, it can also fix an issue: when a newly set unack badblock is
> included in an existing ack badblock, the setting will return an error.
> ···
>   echo "0 100" /sys/block/md0/md/dev-loop1/bad_blocks
>   echo "0 100" /sys/block/md0/md/dev-loop1/unacknowledged_bad_blocks
>   -bash: echo: write error: No space left on device
> ```
> After fix, it will return success.
> 
> Fixes: aa511ff8218b ("badblocks: switch to the improved badblock handling code")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>  block/badblocks.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
>

NACK.   Such modification will break current API.

Current API doesn’t handle partial success/fail condition, if any bad block range is set it should return true.

It is not about correct or wrong, just current fragile design. A new API is necessary to handle such thing. This is why I leave the return value as int other than bool.

Thanks.

Coly Li


 
> diff --git a/block/badblocks.c b/block/badblocks.c
> index 1c8b8f65f6df..a953d2e9417f 100644
> --- a/block/badblocks.c
> +++ b/block/badblocks.c
> @@ -843,7 +843,6 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  	struct badblocks_context bad;
>  	int prev = -1, hint = -1;
>  	unsigned long flags;
> -	int rv = 0;
>  	u64 *p;
>  
>  	if (bb->shift < 0)
> @@ -873,10 +872,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  	bad.len = sectors;
>  	len = 0;
>  
> -	if (badblocks_full(bb)) {
> -		rv = 1;
> +	if (badblocks_full(bb))
>  		goto out;
> -	}
>  
>  	if (badblocks_empty(bb)) {
>  		len = insert_at(bb, 0, &bad);
> @@ -916,10 +913,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  			int extra = 0;
>  
>  			if (!can_front_overwrite(bb, prev, &bad, &extra)) {
> -				if (extra > 0) {
> -					rv = 1;
> +				if (extra > 0)
>  					goto out;
> -				}
>  
>  				len = min_t(sector_t,
>  					    BB_END(p[prev]) - s, sectors);
> @@ -986,10 +981,7 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  
>  	write_sequnlock_irqrestore(&bb->lock, flags);
>  
> -	if (!added)
> -		rv = 1;
> -
> -	return rv;
> +	return sectors;
>  }
>  
>  /*
> @@ -1353,7 +1345,7 @@ EXPORT_SYMBOL_GPL(badblocks_check);
>   *
>   * Return:
>   *  0: success
> - *  1: failed to set badblocks (out of space)
> + *  other: failed to set badblocks (out of space)
>   */
>  int badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>  			int acknowledged)
> -- 
> 2.39.2
> 

-- 
Coly Li
Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Yu Kuai 9 months, 4 weeks ago
Hi,

在 2025/02/21 17:52, Coly Li 写道:
> On Fri, Feb 21, 2025 at 04:11:02PM +0800, Zheng Qixing wrote:
>> From: Li Nan <linan122@huawei.com>
>>
>> _badblocks_set() returns success if at least one badblock is set
>> successfully, even if others fail. This can lead to data inconsistencies
>> in raid, where a failed badblock set should trigger the disk to be kicked
>> out to prevent future reads from failed write areas.
>>
>> _badblocks_set() should return error if any badblock set fails. Instead
>> of relying on 'rv', directly returning 'sectors' for clearer logic. If all
>> badblocks are successfully set, 'sectors' will be 0, otherwise it
>> indicates the number of badblocks that have not been set yet, thus
>> signaling failure.
>>
>> By the way, it can also fix an issue: when a newly set unack badblock is
>> included in an existing ack badblock, the setting will return an error.
>> ···
>>    echo "0 100" /sys/block/md0/md/dev-loop1/bad_blocks
>>    echo "0 100" /sys/block/md0/md/dev-loop1/unacknowledged_bad_blocks
>>    -bash: echo: write error: No space left on device
>> ```
>> After fix, it will return success.
>>
>> Fixes: aa511ff8218b ("badblocks: switch to the improved badblock handling code")
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>>   block/badblocks.c | 16 ++++------------
>>   1 file changed, 4 insertions(+), 12 deletions(-)
>>
> 
> NACK.   Such modification will break current API.

Take a look at current APIs:
- for raid, error should be returned, otherwise data may be corrupted.
- for nvdimm, there is only error message if fail, and it make sense as
well if any badblocks set failed:
         if (badblocks_set(bb, s, num, 1))
                 dev_info_once(bb->dev, "%s: failed for sector %llx\n",
                                 __func__, (u64) s);
- for null_blk, I think it's fine as well.

Hence I think it's fine to return error if any badblocks set failed.
There is no need to invent a new API and switch all callers to a new
API.

Thanks,
Kuai

> 
> Current API doesn’t handle partial success/fail condition, if any bad block range is set it should return true.
> 
> It is not about correct or wrong, just current fragile design. A new API is necessary to handle such thing. This is why I leave the return value as int other than bool.
> 
> Thanks.
> 
> Coly Li
> 
> 
>   
>> diff --git a/block/badblocks.c b/block/badblocks.c
>> index 1c8b8f65f6df..a953d2e9417f 100644
>> --- a/block/badblocks.c
>> +++ b/block/badblocks.c
>> @@ -843,7 +843,6 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>>   	struct badblocks_context bad;
>>   	int prev = -1, hint = -1;
>>   	unsigned long flags;
>> -	int rv = 0;
>>   	u64 *p;
>>   
>>   	if (bb->shift < 0)
>> @@ -873,10 +872,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>>   	bad.len = sectors;
>>   	len = 0;
>>   
>> -	if (badblocks_full(bb)) {
>> -		rv = 1;
>> +	if (badblocks_full(bb))
>>   		goto out;
>> -	}
>>   
>>   	if (badblocks_empty(bb)) {
>>   		len = insert_at(bb, 0, &bad);
>> @@ -916,10 +913,8 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>>   			int extra = 0;
>>   
>>   			if (!can_front_overwrite(bb, prev, &bad, &extra)) {
>> -				if (extra > 0) {
>> -					rv = 1;
>> +				if (extra > 0)
>>   					goto out;
>> -				}
>>   
>>   				len = min_t(sector_t,
>>   					    BB_END(p[prev]) - s, sectors);
>> @@ -986,10 +981,7 @@ static int _badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>>   
>>   	write_sequnlock_irqrestore(&bb->lock, flags);
>>   
>> -	if (!added)
>> -		rv = 1;
>> -
>> -	return rv;
>> +	return sectors;
>>   }
>>   
>>   /*
>> @@ -1353,7 +1345,7 @@ EXPORT_SYMBOL_GPL(badblocks_check);
>>    *
>>    * Return:
>>    *  0: success
>> - *  1: failed to set badblocks (out of space)
>> + *  other: failed to set badblocks (out of space)
>>    */
>>   int badblocks_set(struct badblocks *bb, sector_t s, int sectors,
>>   			int acknowledged)
>> -- 
>> 2.39.2
>>
> 

Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Coly Li 9 months, 4 weeks ago

> 2025年2月21日 18:09,Yu Kuai <yukuai1@huaweicloud.com> 写道:
> 
> Hi,
> 
> 在 2025/02/21 17:52, Coly Li 写道:
>> On Fri, Feb 21, 2025 at 04:11:02PM +0800, Zheng Qixing wrote:
>>> From: Li Nan <linan122@huawei.com>
>>> 
>>> _badblocks_set() returns success if at least one badblock is set
>>> successfully, even if others fail. This can lead to data inconsistencies
>>> in raid, where a failed badblock set should trigger the disk to be kicked
>>> out to prevent future reads from failed write areas.
>>> 
>>> _badblocks_set() should return error if any badblock set fails. Instead
>>> of relying on 'rv', directly returning 'sectors' for clearer logic. If all
>>> badblocks are successfully set, 'sectors' will be 0, otherwise it
>>> indicates the number of badblocks that have not been set yet, thus
>>> signaling failure.
>>> 
>>> By the way, it can also fix an issue: when a newly set unack badblock is
>>> included in an existing ack badblock, the setting will return an error.
>>> ···
>>>   echo "0 100" /sys/block/md0/md/dev-loop1/bad_blocks
>>>   echo "0 100" /sys/block/md0/md/dev-loop1/unacknowledged_bad_blocks
>>>   -bash: echo: write error: No space left on device
>>> ```
>>> After fix, it will return success.
>>> 
>>> Fixes: aa511ff8218b ("badblocks: switch to the improved badblock handling code")
>>> Signed-off-by: Li Nan <linan122@huawei.com>
>>> ---
>>>  block/badblocks.c | 16 ++++------------
>>>  1 file changed, 4 insertions(+), 12 deletions(-)
>>> 
>> NACK.   Such modification will break current API.
> 
> Take a look at current APIs:
> - for raid, error should be returned, otherwise data may be corrupted.
> - for nvdimm, there is only error message if fail, and it make sense as
> well if any badblocks set failed:
>        if (badblocks_set(bb, s, num, 1))
>                dev_info_once(bb->dev, "%s: failed for sector %llx\n",
>                                __func__, (u64) s);
> - for null_blk, I think it's fine as well.
> 
> Hence I think it's fine to return error if any badblocks set failed.
> There is no need to invent a new API and switch all callers to a new
> API.

So we don’t need to add a negative return value for partial success/failure?

Coly Li
Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Yu Kuai 9 months, 3 weeks ago
Hi,

在 2025/02/21 18:12, Coly Li 写道:
> So we don’t need to add a negative return value for partial success/failure?
> 
> Coly Li.

Yes, I think so. No one really use this value, and patch 10 clean this
up by changing return type to bool.

Thanks,
Kuai

Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Coly Li 9 months, 3 weeks ago
On Sat, Feb 22, 2025 at 09:12:53AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/02/21 18:12, Coly Li 写道:
> > So we don’t need to add a negative return value for partial success/failure?
> > 
> > Coly Li.
> 
> Yes, I think so. No one really use this value, and patch 10 clean this
> up by changing return type to bool.

OK, then it is fine to me.

It will be good to add a code comment that parital setting will be treated as failure.

Thanks.


-- 
Coly Li
Re: [PATCH 05/12] badblocks: return error if any badblock set fails
Posted by Li Nan 9 months, 3 weeks ago

在 2025/2/22 14:16, Coly Li 写道:
> On Sat, Feb 22, 2025 at 09:12:53AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/02/21 18:12, Coly Li 写道:
>>> So we don’t need to add a negative return value for partial success/failure?
>>>
>>> Coly Li.
>>
>> Yes, I think so. No one really use this value, and patch 10 clean this
>> up by changing return type to bool.
> 
> OK, then it is fine to me.
> 
> It will be good to add a code comment that parital setting will be treated as failure.
> 
> Thanks.
> 
> 

Thank you for your review. I will add comment in the next version.

-- 
Thanks,
Nan