[v4] block/md/dm: set chunk_sectors from stacked dev stripe size

[PATCH v4 6/6] block: use chunk_sectors when evaluating stacked atomic write limits

Posted by John Garry 3 months ago

The atomic write unit max value is limited by any stacked device stripe
size.

It is required that the atomic write unit is a power-of-2 factor of the
stripe size.

Currently we use io_min limit to hold the stripe size, and check for a
io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.

Nilay reports that this causes a problem when the physical block size is
greater than SECTOR_SIZE [0].

Furthermore, io_min may be mutated when stacking devices, and this makes
it a poor candidate to hold the stripe size. Such an example (of when
io_min may change) would be when the io_min is less than the physical
block size.

Use chunk_sectors to hold the stripe size, which is more appropriate.

[0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-settings.c | 58 ++++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 761c6ccf5af7..3259cfac5d0d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -597,41 +597,52 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
 	return true;
 }
 
-
-/* Check stacking of first bottom device */
-static bool blk_stack_atomic_writes_head(struct queue_limits *t,
-				struct queue_limits *b)
+static void blk_stack_atomic_writes_chunk_sectors(struct queue_limits *t)
 {
-	if (b->atomic_write_hw_boundary &&
-	    !blk_stack_atomic_writes_boundary_head(t, b))
-		return false;
+	unsigned int chunk_sectors = t->chunk_sectors, chunk_bytes;
 
-	if (t->io_min <= SECTOR_SIZE) {
-		/* No chunk sectors, so use bottom device values directly */
-		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
-		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
-		t->atomic_write_hw_max = b->atomic_write_hw_max;
-		return true;
-	}
+	if (!chunk_sectors)
+		return;
+
+	/*
+	 * If chunk sectors is so large that its value in bytes overflows
+	 * UINT_MAX, then just shift it down so it definitely will fit.
+	 * We don't support atomic writes of such a large size anyway.
+	 */
+	if ((unsigned long)chunk_sectors << SECTOR_SHIFT > UINT_MAX)
+		chunk_bytes = chunk_sectors;
+	else
+		chunk_bytes = chunk_sectors << SECTOR_SHIFT;
 
 	/*
 	 * Find values for limits which work for chunk size.
 	 * b->atomic_write_hw_unit_{min, max} may not be aligned with chunk
-	 * size (t->io_min), as chunk size is not restricted to a power-of-2.
+	 * size, as the chunk size is not restricted to a power-of-2.
 	 * So we need to find highest power-of-2 which works for the chunk
 	 * size.
-	 * As an example scenario, we could have b->unit_max = 16K and
-	 * t->io_min = 24K. For this case, reduce t->unit_max to a value
-	 * aligned with both limits, i.e. 8K in this example.
+	 * As an example scenario, we could have t->unit_max = 16K and
+	 * t->chunk_sectors = 24KB. For this case, reduce t->unit_max to a
+	 * value aligned with both limits, i.e. 8K in this example.
 	 */
-	t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
-	while (t->io_min % t->atomic_write_hw_unit_max)
-		t->atomic_write_hw_unit_max /= 2;
+	t->atomic_write_hw_unit_max = min(t->atomic_write_hw_unit_max,
+					max_pow_of_two_factor(chunk_bytes));
 
-	t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
+	t->atomic_write_hw_unit_min = min(t->atomic_write_hw_unit_min,
 					  t->atomic_write_hw_unit_max);
-	t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
+	t->atomic_write_hw_max = min(t->atomic_write_hw_max, chunk_bytes);
+}
 
+/* Check stacking of first bottom device */
+static bool blk_stack_atomic_writes_head(struct queue_limits *t,
+				struct queue_limits *b)
+{
+	if (b->atomic_write_hw_boundary &&
+	    !blk_stack_atomic_writes_boundary_head(t, b))
+		return false;
+
+	t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
+	t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
+	t->atomic_write_hw_max = b->atomic_write_hw_max;
 	return true;
 }
 
@@ -659,6 +670,7 @@ static void blk_stack_atomic_writes_limits(struct queue_limits *t,
 
 	if (!blk_stack_atomic_writes_head(t, b))
 		goto unsupported;
+	blk_stack_atomic_writes_chunk_sectors(t);
 	return;
 
 unsupported:
-- 
2.43.5

Re: [PATCH v4 6/6] block: use chunk_sectors when evaluating stacked atomic write limits

Posted by Nilay Shroff 3 months ago


On 7/7/25 6:41 PM, John Garry wrote:
> The atomic write unit max value is limited by any stacked device stripe
> size.
> 
> It is required that the atomic write unit is a power-of-2 factor of the
> stripe size.
> 
> Currently we use io_min limit to hold the stripe size, and check for a
> io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
> 
> Nilay reports that this causes a problem when the physical block size is
> greater than SECTOR_SIZE [0].
> 
> Furthermore, io_min may be mutated when stacking devices, and this makes
> it a poor candidate to hold the stripe size. Such an example (of when
> io_min may change) would be when the io_min is less than the physical
> block size.
> 
> Use chunk_sectors to hold the stripe size, which is more appropriate.
> 
> [0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  block/blk-settings.c | 58 ++++++++++++++++++++++++++------------------
>  1 file changed, 35 insertions(+), 23 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 761c6ccf5af7..3259cfac5d0d 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -597,41 +597,52 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
>  	return true;
>  }
>  
> -
> -/* Check stacking of first bottom device */
> -static bool blk_stack_atomic_writes_head(struct queue_limits *t,
> -				struct queue_limits *b)
> +static void blk_stack_atomic_writes_chunk_sectors(struct queue_limits *t)
>  {
> -	if (b->atomic_write_hw_boundary &&
> -	    !blk_stack_atomic_writes_boundary_head(t, b))
> -		return false;
> +	unsigned int chunk_sectors = t->chunk_sectors, chunk_bytes;
>  
> -	if (t->io_min <= SECTOR_SIZE) {
> -		/* No chunk sectors, so use bottom device values directly */
> -		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> -		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
> -		t->atomic_write_hw_max = b->atomic_write_hw_max;
> -		return true;
> -	}
> +	if (!chunk_sectors)
> +		return;
> +
> +	/*
> +	 * If chunk sectors is so large that its value in bytes overflows
> +	 * UINT_MAX, then just shift it down so it definitely will fit.
> +	 * We don't support atomic writes of such a large size anyway.
> +	 */
> +	if ((unsigned long)chunk_sectors << SECTOR_SHIFT > UINT_MAX)
> +		chunk_bytes = chunk_sectors;
> +	else
> +		chunk_bytes = chunk_sectors << SECTOR_SHIFT;
>  

Can we use check_shl_overflow() here for checking overflow? Otherwise,
changes look good to me. I've also tested it using my NVMe disk which
supports up to 256kb of atomic writes. 

Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Tested-by: Nilay Shroff <nilay@linux.ibm.com>

Re: [PATCH v4 6/6] block: use chunk_sectors when evaluating stacked atomic write limits

Posted by Mikulas Patocka 3 months ago


On Tue, 8 Jul 2025, Nilay Shroff wrote:

> 
> 
> On 7/7/25 6:41 PM, John Garry wrote:
> > The atomic write unit max value is limited by any stacked device stripe
> > size.
> > 
> > It is required that the atomic write unit is a power-of-2 factor of the
> > stripe size.
> > 
> > Currently we use io_min limit to hold the stripe size, and check for a
> > io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
> > 
> > Nilay reports that this causes a problem when the physical block size is
> > greater than SECTOR_SIZE [0].
> > 
> > Furthermore, io_min may be mutated when stacking devices, and this makes
> > it a poor candidate to hold the stripe size. Such an example (of when
> > io_min may change) would be when the io_min is less than the physical
> > block size.
> > 
> > Use chunk_sectors to hold the stripe size, which is more appropriate.
> > 
> > [0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781
> > 
> > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > ---
> >  block/blk-settings.c | 58 ++++++++++++++++++++++++++------------------
> >  1 file changed, 35 insertions(+), 23 deletions(-)
> > 
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 761c6ccf5af7..3259cfac5d0d 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -597,41 +597,52 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
> >  	return true;
> >  }
> >  
> > -
> > -/* Check stacking of first bottom device */
> > -static bool blk_stack_atomic_writes_head(struct queue_limits *t,
> > -				struct queue_limits *b)
> > +static void blk_stack_atomic_writes_chunk_sectors(struct queue_limits *t)
> >  {
> > -	if (b->atomic_write_hw_boundary &&
> > -	    !blk_stack_atomic_writes_boundary_head(t, b))
> > -		return false;
> > +	unsigned int chunk_sectors = t->chunk_sectors, chunk_bytes;
> >  
> > -	if (t->io_min <= SECTOR_SIZE) {
> > -		/* No chunk sectors, so use bottom device values directly */
> > -		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> > -		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
> > -		t->atomic_write_hw_max = b->atomic_write_hw_max;
> > -		return true;
> > -	}
> > +	if (!chunk_sectors)
> > +		return;
> > +
> > +	/*
> > +	 * If chunk sectors is so large that its value in bytes overflows
> > +	 * UINT_MAX, then just shift it down so it definitely will fit.
> > +	 * We don't support atomic writes of such a large size anyway.
> > +	 */
> > +	if ((unsigned long)chunk_sectors << SECTOR_SHIFT > UINT_MAX)
> > +		chunk_bytes = chunk_sectors;
> > +	else
> > +		chunk_bytes = chunk_sectors << SECTOR_SHIFT;

Why do we cast it to unsigned long? unsigned long is 32-bit on 32-bit 
machines, so the code will not detect the overflow in that case. We should 
cast it to unsigned long long (or uint64_t).

Mikulas

Re: [PATCH v4 6/6] block: use chunk_sectors when evaluating stacked atomic write limits

Posted by John Garry 3 months ago

On 08/07/2025 17:59, Mikulas Patocka wrote:
>>> +
>>> +	/*
>>> +	 * If chunk sectors is so large that its value in bytes overflows
>>> +	 * UINT_MAX, then just shift it down so it definitely will fit.
>>> +	 * We don't support atomic writes of such a large size anyway.
>>> +	 */
>>> +	if ((unsigned long)chunk_sectors << SECTOR_SHIFT > UINT_MAX)
>>> +		chunk_bytes = chunk_sectors;
>>> +	else
>>> +		chunk_bytes = chunk_sectors << SECTOR_SHIFT;
> Why do we cast it to unsigned long? unsigned long is 32-bit on 32-bit
> machines, so the code will not detect the overflow in that case. We should
> cast it to unsigned long long (or uint64_t).

Right, I said earlier that I would use an unsigned long long, but didn't 
do it that way, which was unintentional.

Anyway, I will change this code as suggested by Nilay.

Thanks,
John

Re: [PATCH v4 6/6] block: use chunk_sectors when evaluating stacked atomic write limits

Posted by John Garry 3 months ago

On 08/07/2025 13:27, Nilay Shroff wrote:
>> +	if ((unsigned long)chunk_sectors << SECTOR_SHIFT > UINT_MAX)
>> +		chunk_bytes = chunk_sectors;
>> +	else
>> +		chunk_bytes = chunk_sectors << SECTOR_SHIFT;
>>   
> Can we use check_shl_overflow() here for checking overflow?

ok, I can change.

> Otherwise,
> changes look good to me. I've also tested it using my NVMe disk which
> supports up to 256kb of atomic writes.
 > > Reviewed-by: Nilay Shroff<nilay@linux.ibm.com>
> Tested-by: Nilay Shroff<nilay@linux.ibm.com>

thanks