[PATCHv2 1/7] block: check for valid bio while splitting

Keith Busch posted 7 patches 2 months ago
[PATCHv2 1/7] block: check for valid bio while splitting
Posted by Keith Busch 2 months ago
From: Keith Busch <kbusch@kernel.org>

We're already iterating every segment, so check these for a valid IO at
the same time.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-merge.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 70d704615be52..81bdad915699a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -298,6 +298,9 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
 	unsigned nsegs = 0, bytes = 0;
 
 	bio_for_each_bvec(bv, bio, iter) {
+		if (bv.bv_offset & lim->dma_alignment)
+			return -EINVAL;
+
 		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, disallow it.
@@ -341,6 +344,8 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
 	 * we do not use the full hardware limits.
 	 */
 	bytes = ALIGN_DOWN(bytes, bio_split_alignment(bio, lim));
+	if (!bytes)
+		return -EINVAL;
 
 	/*
 	 * Bio splitting may cause subtle trouble such as hang when doing sync
-- 
2.47.3
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Keith Busch 1 month, 3 weeks ago
On Tue, Aug 05, 2025 at 07:11:17AM -0700, Keith Busch wrote:
>  	bio_for_each_bvec(bv, bio, iter) {
> +		if (bv.bv_offset & lim->dma_alignment)
> +			return -EINVAL;

I have a question about this part here because testing with various scsi
is showing odd results.

NVMe wants this check to actually be this:

		if ((bv.bv_offset | bv.bv_len) & lim->dma_alignment)

because the dma alignment defines not only the starting address offset,
but also the length. NVMe's default alignment is 4 bytes.

But I can't make that change because many scsi devices don't set the dma
alignment and get the default 511 value. This is fine for the memory
address offset, but the lengths sent for various inquriy commands are
much smaller, like 4 and 32 byte lengths. That length wouldn't pass the
dma alignment granularity, so I think the default value is far too
conservative. Does the address start size need to be a different limit
than minimum length? I feel like they should be the same, but maybe
that's just an nvme thing.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Martin K. Petersen 1 month, 3 weeks ago
Hi Keith!

> NVMe wants this check to actually be this:
>
> 		if ((bv.bv_offset | bv.bv_len) & lim->dma_alignment)
>
> because the dma alignment defines not only the starting address offset,
> but also the length. NVMe's default alignment is 4 bytes.

dma_alignment defines the alignment of the DMA starting address. We
don't have a dedicated queue limit for the transfer length granularity
described by NVMe.

-- 
Martin K. Petersen
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Keith Busch 1 month, 3 weeks ago
On Wed, Aug 13, 2025 at 05:23:47PM -0400, Martin K. Petersen wrote:
> dma_alignment defines the alignment of the DMA starting address. We
> don't have a dedicated queue limit for the transfer length granularity
> described by NVMe.

Darn, but thanks for confirming. I'll see if I can get by without
needing a new limit, or look into adding one if not. Worst case, I can
also let the device return the error, though I think we prefer not to
send an IO that we know should fail.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Christoph Hellwig 1 month, 2 weeks ago
On Wed, Aug 13, 2025 at 03:52:44PM -0600, Keith Busch wrote:
> On Wed, Aug 13, 2025 at 05:23:47PM -0400, Martin K. Petersen wrote:
> > dma_alignment defines the alignment of the DMA starting address. We
> > don't have a dedicated queue limit for the transfer length granularity
> > described by NVMe.
> 
> Darn, but thanks for confirming. I'll see if I can get by without
> needing a new limit, or look into adding one if not. Worst case, I can
> also let the device return the error, though I think we prefer not to
> send an IO that we know should fail.

Allowing an unprivileged user application to trivially trigger an I/O
error sounds like a bad idea.

But I think simply applying the dma_alignment to the length of
READ/WRITE commands might be work, while ignoring it for other types
of commands.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Bart Van Assche 1 month, 3 weeks ago
On 8/13/25 1:06 PM, Keith Busch wrote:
> But I can't make that change because many scsi devices don't set the dma
> alignment and get the default 511 value. This is fine for the memory
> address offset, but the lengths sent for various inquriy commands are
> much smaller, like 4 and 32 byte lengths. That length wouldn't pass the
> dma alignment granularity, so I think the default value is far too
> conservative. Does the address start size need to be a different limit
> than minimum length? I feel like they should be the same, but maybe
> that's just an nvme thing.

Hi Keith,

Maybe I misunderstood your question. It seems to me that the SCSI core
sets the DMA alignment by default to four bytes. From
drivers/scsi/hosts.c:

	/* 32-byte (dword) is a common minimum for HBAs. */
	if (sht->dma_alignment)
		shost->dma_alignment = sht->dma_alignment;
	else
		shost->dma_alignment = 3;

Bart.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Keith Busch 1 month, 3 weeks ago
On Wed, Aug 13, 2025 at 01:41:49PM -0700, Bart Van Assche wrote:
> On 8/13/25 1:06 PM, Keith Busch wrote:
> > But I can't make that change because many scsi devices don't set the dma
> > alignment and get the default 511 value. This is fine for the memory
> > address offset, but the lengths sent for various inquriy commands are
> > much smaller, like 4 and 32 byte lengths. That length wouldn't pass the
> > dma alignment granularity, so I think the default value is far too
> > conservative. Does the address start size need to be a different limit
> > than minimum length? I feel like they should be the same, but maybe
> > that's just an nvme thing.
> 
> Hi Keith,
> 
> Maybe I misunderstood your question. It seems to me that the SCSI core
> sets the DMA alignment by default to four bytes. From
> drivers/scsi/hosts.c:

Thanks, I think you got my meaning. 

I'm using the AHCI driver. It looks like ata_scsi_dev_config() overrides
the dma_alignment to sector_size - 1, and that pattern goes way back,
almost 20 years ago, so maybe I can't change it.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Damien Le Moal 1 month, 3 weeks ago
On 8/14/25 6:01 AM, Keith Busch wrote:
> On Wed, Aug 13, 2025 at 01:41:49PM -0700, Bart Van Assche wrote:
>> On 8/13/25 1:06 PM, Keith Busch wrote:
>>> But I can't make that change because many scsi devices don't set the dma
>>> alignment and get the default 511 value. This is fine for the memory
>>> address offset, but the lengths sent for various inquriy commands are
>>> much smaller, like 4 and 32 byte lengths. That length wouldn't pass the
>>> dma alignment granularity, so I think the default value is far too
>>> conservative. Does the address start size need to be a different limit
>>> than minimum length? I feel like they should be the same, but maybe
>>> that's just an nvme thing.
>>
>> Hi Keith,
>>
>> Maybe I misunderstood your question. It seems to me that the SCSI core
>> sets the DMA alignment by default to four bytes. From
>> drivers/scsi/hosts.c:
> 
> Thanks, I think you got my meaning. 
> 
> I'm using the AHCI driver. It looks like ata_scsi_dev_config() overrides
> the dma_alignment to sector_size - 1, and that pattern goes way back,
> almost 20 years ago, so maybe I can't change it.

That is probably buggy now in the sense that the scsi layer should be able to
send any command with a size not aligned to the LBA size or ATA sector (512 B)
and libata-scsi SAT should do the translation using an internal 512B aligned
command size.

What makes a mess here is that SCSI allows having a media-access command
specifying a transfer size that is not aligned on the LBA size. The transfer
will be "short" in that case, which is perfectly fine with SCSI. But ATA does
not allow that. It is all or nothing and the command size thus must always be
aligned to the LBA size.

I think that dma_alignment was abused to check that. But I think it should not
be too hard to check the alignment in libata-scsi when translating the command.
SAS HBAs should be doing something similar too. Have never exactly tested that
though, and I am afraid how many SAS HBAs will not like unaligned command to
ATA devices...

We also have the different alignment for management commands (most of which use
512B sector size) and media access commands which use the actual device LBA
size alignment.

So it is a mess :)

-- 
Damien Le Moal
Western Digital Research
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Bart Van Assche 1 month, 3 weeks ago
On 8/13/25 2:01 PM, Keith Busch wrote:
> I'm using the AHCI driver. It looks like ata_scsi_dev_config() overrides
> the dma_alignment to sector_size - 1, and that pattern goes way back,
> almost 20 years ago, so maybe I can't change it.

 From an AHCI specification
(https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/serial-ata-ahci-spec-rev1-3-1.pdf):
"Data Base Address (DBA): Indicates the 32-bit physical address of the 
data block. The block must be word aligned, indicated by bit 0 being 
reserved."

Please note that I have no experience with programming AHCI controllers.

Bart.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Christoph Hellwig 1 month, 3 weeks ago
On Tue, Aug 05, 2025 at 07:11:17AM -0700, Keith Busch wrote:
> @@ -341,6 +344,8 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
>  	 * we do not use the full hardware limits.
>  	 */
>  	bytes = ALIGN_DOWN(bytes, bio_split_alignment(bio, lim));
> +	if (!bytes)
> +		return -EINVAL;

How is this related to the other hunk and the patch description?
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Keith Busch 1 month, 3 weeks ago
On Sun, Aug 10, 2025 at 07:37:36AM -0700, Christoph Hellwig wrote:
> On Tue, Aug 05, 2025 at 07:11:17AM -0700, Keith Busch wrote:
> > @@ -341,6 +344,8 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
> >  	 * we do not use the full hardware limits.
> >  	 */
> >  	bytes = ALIGN_DOWN(bytes, bio_split_alignment(bio, lim));
> > +	if (!bytes)
> > +		return -EINVAL;
> 
> How is this related to the other hunk and the patch description?

The patchset allows you to submit an io with vectors that are partial
logical blocks. Misuse could create a bio that exceeds the device max
vectors or introduces virtual boundary gaps, requiring a split into
something that is smaller than a block size. This check catches that.

Quick example: nvme with a 4k logical block size, and the usual 4k
virtual boundary. Send an io with four vectors iov_len=1k. The total
size is block sized, but there's no way that could split into a valid
io. There's a test specifically for this in my reply about xfstests.
Re: [PATCHv2 1/7] block: check for valid bio while splitting
Posted by Christoph Hellwig 1 month, 3 weeks ago
On Sun, Aug 10, 2025 at 09:39:50AM -0600, Keith Busch wrote:
> > >  	bytes = ALIGN_DOWN(bytes, bio_split_alignment(bio, lim));
> > > +	if (!bytes)
> > > +		return -EINVAL;
> > 
> > How is this related to the other hunk and the patch description?
> 
> The patchset allows you to submit an io with vectors that are partial
> logical blocks. Misuse could create a bio that exceeds the device max
> vectors or introduces virtual boundary gaps, requiring a split into
> something that is smaller than a block size. This check catches that.
> 
> Quick example: nvme with a 4k logical block size, and the usual 4k
> virtual boundary. Send an io with four vectors iov_len=1k. The total
> size is block sized, but there's no way that could split into a valid
> io. There's a test specifically for this in my reply about xfstests.

Can you turn the above into a comment explaining the check?