[PATCH RFC 4/4] block: use chunk_sectors when evaluating stacked atomic write limits

John Garry posted 4 patches 6 months, 2 weeks ago
[PATCH RFC 4/4] block: use chunk_sectors when evaluating stacked atomic write limits
Posted by John Garry 6 months, 2 weeks ago
The atomic write unit max is limited by any stack device stripe size.

It is required that the atomic write unit is a power-of-2 factor of the
stripe size.

Currently we use io_min limit to hold the stripe size, and check for a
io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.

Nilay reports that this causes a problem when the physical block size is
greater than SECTOR_SIZE [0].

Furthermore, io_min may be mutated when stacking devices, and this makes
it a poor candidate to hold the stripe size. Such an example would be
when the io_min is less than the physical block size.

Use chunk_sectors to hold the stripe size, which is more appropriate.

[0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-settings.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..5b0f1a854e81 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -594,11 +594,13 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
 static bool blk_stack_atomic_writes_head(struct queue_limits *t,
 				struct queue_limits *b)
 {
+	unsigned int chunk_size = t->chunk_sectors << SECTOR_SHIFT;
+
 	if (b->atomic_write_hw_boundary &&
 	    !blk_stack_atomic_writes_boundary_head(t, b))
 		return false;
 
-	if (t->io_min <= SECTOR_SIZE) {
+	if (!t->chunk_sectors) {
 		/* No chunk sectors, so use bottom device values directly */
 		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
 		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
@@ -617,12 +619,12 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
 	 * aligned with both limits, i.e. 8K in this example.
 	 */
 	t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
-	while (t->io_min % t->atomic_write_hw_unit_max)
+	while (chunk_size % t->atomic_write_hw_unit_max)
 		t->atomic_write_hw_unit_max /= 2;
 
 	t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
 					  t->atomic_write_hw_unit_max);
-	t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
+	t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
 
 	return true;
 }
-- 
2.31.1
Re: [PATCH RFC 4/4] block: use chunk_sectors when evaluating stacked atomic write limits
Posted by Nilay Shroff 6 months, 2 weeks ago

On 6/5/25 8:38 PM, John Garry wrote:
> The atomic write unit max is limited by any stack device stripe size.
> 
> It is required that the atomic write unit is a power-of-2 factor of the
> stripe size.
> 
> Currently we use io_min limit to hold the stripe size, and check for a
> io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
> 
> Nilay reports that this causes a problem when the physical block size is
> greater than SECTOR_SIZE [0].
> 
> Furthermore, io_min may be mutated when stacking devices, and this makes
> it a poor candidate to hold the stripe size. Such an example would be
> when the io_min is less than the physical block size.
> 
> Use chunk_sectors to hold the stripe size, which is more appropriate.
> 
> [0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  block/blk-settings.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a000daafbfb4..5b0f1a854e81 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -594,11 +594,13 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
>  static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>  				struct queue_limits *b)
>  {
> +	unsigned int chunk_size = t->chunk_sectors << SECTOR_SHIFT;
> +
>  	if (b->atomic_write_hw_boundary &&
>  	    !blk_stack_atomic_writes_boundary_head(t, b))
>  		return false;
>  
> -	if (t->io_min <= SECTOR_SIZE) {
> +	if (!t->chunk_sectors) {
>  		/* No chunk sectors, so use bottom device values directly */
>  		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
>  		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
> @@ -617,12 +619,12 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>  	 * aligned with both limits, i.e. 8K in this example.
>  	 */
>  	t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> -	while (t->io_min % t->atomic_write_hw_unit_max)
> +	while (chunk_size % t->atomic_write_hw_unit_max)
>  		t->atomic_write_hw_unit_max /= 2;
>  
>  	t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
>  					  t->atomic_write_hw_unit_max);
> -	t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
> +	t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
>  
>  	return true;
>  }

This works well with my NVMe disk which supports atomic writes however the only
concern is what if in case t->chunk_sectors is also defined for NVMe disk? 
I see that nvme_set_chunk_sectors() initializes the chunk_sectors for NVMe. 
The value which is assigned to lim->chunk_sectors in nvme_set_chunk_sectors()
represents "noiob" (i.e. Namespace Optimal I/O Boundary). My disk has "noiob" 
set to zero but in case if it's non-zero then would it break the above logic
for NVMe atomic writes?

Thanks,
--Nilay
Re: [PATCH RFC 4/4] block: use chunk_sectors when evaluating stacked atomic write limits
Posted by John Garry 6 months, 1 week ago
On 06/06/2025 16:23, Nilay Shroff wrote:
>>   	t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
>>   					  t->atomic_write_hw_unit_max);
>> -	t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
>> +	t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
>>   
>>   	return true;
>>   }
> This works well with my NVMe disk which supports atomic writes however the only
> concern is what if in case t->chunk_sectors is also defined for NVMe disk?
> I see that nvme_set_chunk_sectors() initializes the chunk_sectors for NVMe.
> The value which is assigned to lim->chunk_sectors in nvme_set_chunk_sectors()
> represents "noiob" (i.e. Namespace Optimal I/O Boundary). My disk has "noiob"
> set to zero but in case if it's non-zero then would it break the above logic
> for NVMe atomic writes?

Yeah, I think that I need to change the code to account for the bottom 
device setting chunk_sectors.

Thanks,
John