[PATCH 1/2] md/raid5: fix IO hang with degraded array with llbitmap

Yu Kuai posted 2 patches 2 weeks ago
[PATCH 1/2] md/raid5: fix IO hang with degraded array with llbitmap
Posted by Yu Kuai 2 weeks ago
When llbitmap bit state is still unwritten, any new write should force
rcw, as bitmap_ops->blocks_synced() is checked in handle_stripe_dirting().
However, later the same check is missing in need_this_block(), causing
stripe to deadloop during handling because handle_stripe() will decide
to go to handle_stripe_fill(), meanwhile need_this_block() always return
0 and nothing is handled.

Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid5.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8dc98f545969..93e672b3432b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3751,9 +3751,14 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
 	struct r5dev *dev = &sh->dev[disk_idx];
 	struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]],
 				  &sh->dev[s->failed_num[1]] };
+	struct mddev *mddev = sh->raid_conf->mddev;
+	bool force_rcw = false;
 	int i;
-	bool force_rcw = (sh->raid_conf->rmw_level == PARITY_DISABLE_RMW);
 
+	if (sh->raid_conf->rmw_level == PARITY_DISABLE_RMW ||
+	    (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced &&
+	     !mddev->bitmap_ops->blocks_synced(mddev, sh->sector)))
+		force_rcw = true;
 
 	if (test_bit(R5_LOCKED, &dev->flags) ||
 	    test_bit(R5_UPTODATE, &dev->flags))
-- 
2.51.0
Re: [PATCH 1/2] md/raid5: fix IO hang with degraded array with llbitmap
Posted by Li Nan 1 week, 6 days ago

在 2026/1/24 2:26, Yu Kuai 写道:
> When llbitmap bit state is still unwritten, any new write should force
> rcw, as bitmap_ops->blocks_synced() is checked in handle_stripe_dirting().

s/handle_stripe_dirting/handle_stripe_dirtying/

Besides this, LGTM

Reviewed-by: Li Nan <linan122@huawei.com>

> However, later the same check is missing in need_this_block(), causing
> stripe to deadloop during handling because handle_stripe() will decide
> to go to handle_stripe_fill(), meanwhile need_this_block() always return
> 0 and nothing is handled.
> 
> Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap")
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/raid5.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8dc98f545969..93e672b3432b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3751,9 +3751,14 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
>   	struct r5dev *dev = &sh->dev[disk_idx];
>   	struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]],
>   				  &sh->dev[s->failed_num[1]] };
> +	struct mddev *mddev = sh->raid_conf->mddev;
> +	bool force_rcw = false;
>   	int i;
> -	bool force_rcw = (sh->raid_conf->rmw_level == PARITY_DISABLE_RMW);
>   
> +	if (sh->raid_conf->rmw_level == PARITY_DISABLE_RMW ||
> +	    (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced &&
> +	     !mddev->bitmap_ops->blocks_synced(mddev, sh->sector)))
> +		force_rcw = true;
>   
>   	if (test_bit(R5_LOCKED, &dev->flags) ||
>   	    test_bit(R5_UPTODATE, &dev->flags))

-- 
Thanks,
Nan