The atomic write unit max is limited by any stack device stripe size.
It is required that the atomic write unit is a power-of-2 factor of the
stripe size.
Currently we use io_min limit to hold the stripe size, and check for a
io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
Nilay reports that this causes a problem when the physical block size is
greater than SECTOR_SIZE [0].
Furthermore, io_min may be mutated when stacking devices, and this makes
it a poor candidate to hold the stripe size. Such an example would be
when the io_min is less than the physical block size.
Use chunk_sectors to hold the stripe size, which is more appropriate.
[0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
block/blk-settings.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..5b0f1a854e81 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -594,11 +594,13 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
static bool blk_stack_atomic_writes_head(struct queue_limits *t,
struct queue_limits *b)
{
+ unsigned int chunk_size = t->chunk_sectors << SECTOR_SHIFT;
+
if (b->atomic_write_hw_boundary &&
!blk_stack_atomic_writes_boundary_head(t, b))
return false;
- if (t->io_min <= SECTOR_SIZE) {
+ if (!t->chunk_sectors) {
/* No chunk sectors, so use bottom device values directly */
t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
@@ -617,12 +619,12 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
* aligned with both limits, i.e. 8K in this example.
*/
t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
- while (t->io_min % t->atomic_write_hw_unit_max)
+ while (chunk_size % t->atomic_write_hw_unit_max)
t->atomic_write_hw_unit_max /= 2;
t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
t->atomic_write_hw_unit_max);
- t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
+ t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
return true;
}
--
2.31.1
On 6/5/25 8:38 PM, John Garry wrote:
> The atomic write unit max is limited by any stack device stripe size.
>
> It is required that the atomic write unit is a power-of-2 factor of the
> stripe size.
>
> Currently we use io_min limit to hold the stripe size, and check for a
> io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
>
> Nilay reports that this causes a problem when the physical block size is
> greater than SECTOR_SIZE [0].
>
> Furthermore, io_min may be mutated when stacking devices, and this makes
> it a poor candidate to hold the stripe size. Such an example would be
> when the io_min is less than the physical block size.
>
> Use chunk_sectors to hold the stripe size, which is more appropriate.
>
> [0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> block/blk-settings.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a000daafbfb4..5b0f1a854e81 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -594,11 +594,13 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
> static bool blk_stack_atomic_writes_head(struct queue_limits *t,
> struct queue_limits *b)
> {
> + unsigned int chunk_size = t->chunk_sectors << SECTOR_SHIFT;
> +
> if (b->atomic_write_hw_boundary &&
> !blk_stack_atomic_writes_boundary_head(t, b))
> return false;
>
> - if (t->io_min <= SECTOR_SIZE) {
> + if (!t->chunk_sectors) {
> /* No chunk sectors, so use bottom device values directly */
> t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
> @@ -617,12 +619,12 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
> * aligned with both limits, i.e. 8K in this example.
> */
> t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> - while (t->io_min % t->atomic_write_hw_unit_max)
> + while (chunk_size % t->atomic_write_hw_unit_max)
> t->atomic_write_hw_unit_max /= 2;
>
> t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
> t->atomic_write_hw_unit_max);
> - t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
> + t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
>
> return true;
> }
This works well with my NVMe disk which supports atomic writes however the only
concern is what if in case t->chunk_sectors is also defined for NVMe disk?
I see that nvme_set_chunk_sectors() initializes the chunk_sectors for NVMe.
The value which is assigned to lim->chunk_sectors in nvme_set_chunk_sectors()
represents "noiob" (i.e. Namespace Optimal I/O Boundary). My disk has "noiob"
set to zero but in case if it's non-zero then would it break the above logic
for NVMe atomic writes?
Thanks,
--Nilay
On 06/06/2025 16:23, Nilay Shroff wrote: >> t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min, >> t->atomic_write_hw_unit_max); >> - t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min); >> + t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size); >> >> return true; >> } > This works well with my NVMe disk which supports atomic writes however the only > concern is what if in case t->chunk_sectors is also defined for NVMe disk? > I see that nvme_set_chunk_sectors() initializes the chunk_sectors for NVMe. > The value which is assigned to lim->chunk_sectors in nvme_set_chunk_sectors() > represents "noiob" (i.e. Namespace Optimal I/O Boundary). My disk has "noiob" > set to zero but in case if it's non-zero then would it break the above logic > for NVMe atomic writes? Yeah, I think that I need to change the code to account for the bottom device setting chunk_sectors. Thanks, John
© 2016 - 2025 Red Hat, Inc.