[PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists

Ionut Nechita (Wind River) posted 2 patches 3 weeks ago
drivers/nvme/host/pci.c | 15 ++++++++++-----
kernel/dma/mapping.c    | 13 ++++++++-----
2 files changed, 18 insertions(+), 10 deletions(-)
[PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists
Posted by Ionut Nechita (Wind River) 3 weeks ago
dma_opt_mapping_size() currently returns min(dma_max_mapping_size(),
SIZE_MAX) when neither an IOMMU nor a DMA ops opt_mapping_size callback
is present.  That value is the DMA maximum, not an optimal transfer
size, yet callers treat it as a genuine optimization hint.

The concrete problem shows up on SAS controllers (e.g. mpt3sas) running
with IOMMU in passthrough mode.  The bogus value propagates through
scsi_transport_sas into Scsi_Host.opt_sectors and then into the block
device's optimal_io_size.  mkfs.xfs picks it up, computes
swidth=4095 / sunit=2, and fails with:

  XFS: SB stripe unit sanity check failed

making it impossible to create filesystems during system bootstrap.

Patch 1 changes dma_opt_mapping_size() to return 0 ("no preference")
when no backend provides a real hint.

Patch 2 adjusts the only other in-tree caller (nvme-pci) to handle the
new 0 return value, falling back to its existing default instead of
setting max_hw_sectors to 0.

Note: the scsi_transport_sas caller (the one that triggers the XFS
issue) already handles 0 safely.  It passes the return value through
min_t() into shost->opt_sectors, which becomes 0; sd.c then feeds that
into min_not_zero() when computing io_opt, so a zero opt_sectors is
correctly treated as "no preference" and ignored.

Based on linux-next (next-20260316).

Ionut Nechita (2):
  dma: return 0 from dma_opt_mapping_size() when no real hint exists
  nvme-pci: handle dma_opt_mapping_size() returning 0

 drivers/nvme/host/pci.c | 15 ++++++++++-----
 kernel/dma/mapping.c    | 13 ++++++++-----
 2 files changed, 18 insertions(+), 10 deletions(-)

-- 
2.53.0
Re: [PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists
Posted by John Garry 2 weeks, 6 days ago
On 16/03/2026 20:39, Ionut Nechita (Wind River) wrote:
> dma_opt_mapping_size() currently returns min(dma_max_mapping_size(),
> SIZE_MAX) when neither an IOMMU nor a DMA ops opt_mapping_size callback
> is present.  That value is the DMA maximum, not an optimal transfer
> size, yet callers treat it as a genuine optimization hint.
> 
> The concrete problem shows up on SAS controllers (e.g. mpt3sas) running
> with IOMMU in passthrough mode.  The bogus value propagates through
> scsi_transport_sas into Scsi_Host.opt_sectors and then into the block
> device's optimal_io_size.  mkfs.xfs picks it up, computes
> swidth=4095 / sunit=2, and fails with:
> 
>    XFS: SB stripe unit sanity check failed
> 
> making it impossible to create filesystems during system bootstrap.

For SAS controllers, don't we limit shost->opt_sectors at 
shost->max_sectors, and then in sd_revalidate_disk() this value is 
ignored as sdkp->opt_xfer_blocks would be smaller, right?

What value are you seeing for max_sectors and opt_sectors? That mpt3sas 
driver seems to have many methods to set max_sectors.

Thanks,
John

> 
> Patch 1 changes dma_opt_mapping_size() to return 0 ("no preference")
> when no backend provides a real hint.
> 
> Patch 2 adjusts the only other in-tree caller (nvme-pci) to handle the
> new 0 return value, falling back to its existing default instead of
> setting max_hw_sectors to 0.
> 
> Note: the scsi_transport_sas caller (the one that triggers the XFS
> issue) already handles 0 safely.  It passes the return value through
> min_t() into shost->opt_sectors, which becomes 0; sd.c then feeds that
> into min_not_zero() when computing io_opt, so a zero opt_sectors is
> correctly treated as "no preference" and ignored.
> 
> Based on linux-next (next-20260316).
> 
> Ionut Nechita (2):
>    dma: return 0 from dma_opt_mapping_size() when no real hint exists
>    nvme-pci: handle dma_opt_mapping_size() returning 0
> 
>   drivers/nvme/host/pci.c | 15 ++++++++++-----
>   kernel/dma/mapping.c    | 13 ++++++++-----
>   2 files changed, 18 insertions(+), 10 deletions(-)
>
Re: [PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists
Posted by Christoph Hellwig 2 weeks, 6 days ago
On Tue, Mar 17, 2026 at 09:11:59AM +0000, John Garry wrote:
> For SAS controllers, don't we limit shost->opt_sectors at 
> shost->max_sectors, and then in sd_revalidate_disk() this value is ignored 
> as sdkp->opt_xfer_blocks would be smaller, right?

That assumes opt_xfer_blocks is actually set.  It's an optional and
relatively recent SCSI feature.  So don't expect crappy SSDs or
RAID controllers faking up SCSI in shitty firmware to actually set
it.
Re: [PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists
Posted by John Garry 2 weeks, 6 days ago
On 17/03/2026 14:36, Christoph Hellwig wrote:
> On Tue, Mar 17, 2026 at 09:11:59AM +0000, John Garry wrote:
>> For SAS controllers, don't we limit shost->opt_sectors at
>> shost->max_sectors, and then in sd_revalidate_disk() this value is ignored
>> as sdkp->opt_xfer_blocks would be smaller, right?
> That assumes opt_xfer_blocks is actually set.  It's an optional and
> relatively recent SCSI feature.  So don't expect crappy SSDs or
> RAID controllers faking up SCSI in shitty firmware to actually set
> it.

Sure, and then we would have io_opt at max_sectors, and it seems that 
value is totally configurable for that HBA driver.

However I still find the values reported strange:
swidth=4095 / sunit=2

I thought that they were from io_opt and io_min, and 
blk_validate_limits() does rounding to PBS, except io_min has no 
rounding for > PBS.
Re: [PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists
Posted by Damien Le Moal 2 weeks, 6 days ago
On 3/17/26 18:11, John Garry wrote:
> On 16/03/2026 20:39, Ionut Nechita (Wind River) wrote:
>> dma_opt_mapping_size() currently returns min(dma_max_mapping_size(),
>> SIZE_MAX) when neither an IOMMU nor a DMA ops opt_mapping_size callback
>> is present.  That value is the DMA maximum, not an optimal transfer
>> size, yet callers treat it as a genuine optimization hint.
>>
>> The concrete problem shows up on SAS controllers (e.g. mpt3sas) running
>> with IOMMU in passthrough mode.  The bogus value propagates through
>> scsi_transport_sas into Scsi_Host.opt_sectors and then into the block
>> device's optimal_io_size.  mkfs.xfs picks it up, computes
>> swidth=4095 / sunit=2, and fails with:
>>
>>    XFS: SB stripe unit sanity check failed
>>
>> making it impossible to create filesystems during system bootstrap.
> 
> For SAS controllers, don't we limit shost->opt_sectors at 
> shost->max_sectors, and then in sd_revalidate_disk() this value is 
> ignored as sdkp->opt_xfer_blocks would be smaller, right?
> 
> What value are you seeing for max_sectors and opt_sectors? That mpt3sas 
> driver seems to have many methods to set max_sectors.

And mpi3mr is also very similar.

> 
> Thanks,
> John
> 
>>
>> Patch 1 changes dma_opt_mapping_size() to return 0 ("no preference")
>> when no backend provides a real hint.
>>
>> Patch 2 adjusts the only other in-tree caller (nvme-pci) to handle the
>> new 0 return value, falling back to its existing default instead of
>> setting max_hw_sectors to 0.
>>
>> Note: the scsi_transport_sas caller (the one that triggers the XFS
>> issue) already handles 0 safely.  It passes the return value through
>> min_t() into shost->opt_sectors, which becomes 0; sd.c then feeds that
>> into min_not_zero() when computing io_opt, so a zero opt_sectors is
>> correctly treated as "no preference" and ignored.
>>
>> Based on linux-next (next-20260316).
>>
>> Ionut Nechita (2):
>>    dma: return 0 from dma_opt_mapping_size() when no real hint exists
>>    nvme-pci: handle dma_opt_mapping_size() returning 0
>>
>>   drivers/nvme/host/pci.c | 15 ++++++++++-----
>>   kernel/dma/mapping.c    | 13 ++++++++-----
>>   2 files changed, 18 insertions(+), 10 deletions(-)
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research