drivers/nvme/host/pci.c | 15 ++++++++++----- kernel/dma/mapping.c | 13 ++++++++----- 2 files changed, 18 insertions(+), 10 deletions(-)
dma_opt_mapping_size() currently returns min(dma_max_mapping_size(),
SIZE_MAX) when neither an IOMMU nor a DMA ops opt_mapping_size callback
is present. That value is the DMA maximum, not an optimal transfer
size, yet callers treat it as a genuine optimization hint.
The concrete problem shows up on SAS controllers (e.g. mpt3sas) running
with IOMMU in passthrough mode. The bogus value propagates through
scsi_transport_sas into Scsi_Host.opt_sectors and then into the block
device's optimal_io_size. mkfs.xfs picks it up, computes
swidth=4095 / sunit=2, and fails with:
XFS: SB stripe unit sanity check failed
making it impossible to create filesystems during system bootstrap.
Patch 1 changes dma_opt_mapping_size() to return 0 ("no preference")
when no backend provides a real hint.
Patch 2 adjusts the only other in-tree caller (nvme-pci) to handle the
new 0 return value, falling back to its existing default instead of
setting max_hw_sectors to 0.
Note: the scsi_transport_sas caller (the one that triggers the XFS
issue) already handles 0 safely. It passes the return value through
min_t() into shost->opt_sectors, which becomes 0; sd.c then feeds that
into min_not_zero() when computing io_opt, so a zero opt_sectors is
correctly treated as "no preference" and ignored.
Based on linux-next (next-20260316).
Ionut Nechita (2):
dma: return 0 from dma_opt_mapping_size() when no real hint exists
nvme-pci: handle dma_opt_mapping_size() returning 0
drivers/nvme/host/pci.c | 15 ++++++++++-----
kernel/dma/mapping.c | 13 ++++++++-----
2 files changed, 18 insertions(+), 10 deletions(-)
--
2.53.0
On 16/03/2026 20:39, Ionut Nechita (Wind River) wrote:
> dma_opt_mapping_size() currently returns min(dma_max_mapping_size(),
> SIZE_MAX) when neither an IOMMU nor a DMA ops opt_mapping_size callback
> is present. That value is the DMA maximum, not an optimal transfer
> size, yet callers treat it as a genuine optimization hint.
>
> The concrete problem shows up on SAS controllers (e.g. mpt3sas) running
> with IOMMU in passthrough mode. The bogus value propagates through
> scsi_transport_sas into Scsi_Host.opt_sectors and then into the block
> device's optimal_io_size. mkfs.xfs picks it up, computes
> swidth=4095 / sunit=2, and fails with:
>
> XFS: SB stripe unit sanity check failed
>
> making it impossible to create filesystems during system bootstrap.
For SAS controllers, don't we limit shost->opt_sectors at
shost->max_sectors, and then in sd_revalidate_disk() this value is
ignored as sdkp->opt_xfer_blocks would be smaller, right?
What value are you seeing for max_sectors and opt_sectors? That mpt3sas
driver seems to have many methods to set max_sectors.
Thanks,
John
>
> Patch 1 changes dma_opt_mapping_size() to return 0 ("no preference")
> when no backend provides a real hint.
>
> Patch 2 adjusts the only other in-tree caller (nvme-pci) to handle the
> new 0 return value, falling back to its existing default instead of
> setting max_hw_sectors to 0.
>
> Note: the scsi_transport_sas caller (the one that triggers the XFS
> issue) already handles 0 safely. It passes the return value through
> min_t() into shost->opt_sectors, which becomes 0; sd.c then feeds that
> into min_not_zero() when computing io_opt, so a zero opt_sectors is
> correctly treated as "no preference" and ignored.
>
> Based on linux-next (next-20260316).
>
> Ionut Nechita (2):
> dma: return 0 from dma_opt_mapping_size() when no real hint exists
> nvme-pci: handle dma_opt_mapping_size() returning 0
>
> drivers/nvme/host/pci.c | 15 ++++++++++-----
> kernel/dma/mapping.c | 13 ++++++++-----
> 2 files changed, 18 insertions(+), 10 deletions(-)
>
On Tue, Mar 17, 2026 at 09:11:59AM +0000, John Garry wrote: > For SAS controllers, don't we limit shost->opt_sectors at > shost->max_sectors, and then in sd_revalidate_disk() this value is ignored > as sdkp->opt_xfer_blocks would be smaller, right? That assumes opt_xfer_blocks is actually set. It's an optional and relatively recent SCSI feature. So don't expect crappy SSDs or RAID controllers faking up SCSI in shitty firmware to actually set it.
On 17/03/2026 14:36, Christoph Hellwig wrote: > On Tue, Mar 17, 2026 at 09:11:59AM +0000, John Garry wrote: >> For SAS controllers, don't we limit shost->opt_sectors at >> shost->max_sectors, and then in sd_revalidate_disk() this value is ignored >> as sdkp->opt_xfer_blocks would be smaller, right? > That assumes opt_xfer_blocks is actually set. It's an optional and > relatively recent SCSI feature. So don't expect crappy SSDs or > RAID controllers faking up SCSI in shitty firmware to actually set > it. Sure, and then we would have io_opt at max_sectors, and it seems that value is totally configurable for that HBA driver. However I still find the values reported strange: swidth=4095 / sunit=2 I thought that they were from io_opt and io_min, and blk_validate_limits() does rounding to PBS, except io_min has no rounding for > PBS.
On 3/17/26 18:11, John Garry wrote:
> On 16/03/2026 20:39, Ionut Nechita (Wind River) wrote:
>> dma_opt_mapping_size() currently returns min(dma_max_mapping_size(),
>> SIZE_MAX) when neither an IOMMU nor a DMA ops opt_mapping_size callback
>> is present. That value is the DMA maximum, not an optimal transfer
>> size, yet callers treat it as a genuine optimization hint.
>>
>> The concrete problem shows up on SAS controllers (e.g. mpt3sas) running
>> with IOMMU in passthrough mode. The bogus value propagates through
>> scsi_transport_sas into Scsi_Host.opt_sectors and then into the block
>> device's optimal_io_size. mkfs.xfs picks it up, computes
>> swidth=4095 / sunit=2, and fails with:
>>
>> XFS: SB stripe unit sanity check failed
>>
>> making it impossible to create filesystems during system bootstrap.
>
> For SAS controllers, don't we limit shost->opt_sectors at
> shost->max_sectors, and then in sd_revalidate_disk() this value is
> ignored as sdkp->opt_xfer_blocks would be smaller, right?
>
> What value are you seeing for max_sectors and opt_sectors? That mpt3sas
> driver seems to have many methods to set max_sectors.
And mpi3mr is also very similar.
>
> Thanks,
> John
>
>>
>> Patch 1 changes dma_opt_mapping_size() to return 0 ("no preference")
>> when no backend provides a real hint.
>>
>> Patch 2 adjusts the only other in-tree caller (nvme-pci) to handle the
>> new 0 return value, falling back to its existing default instead of
>> setting max_hw_sectors to 0.
>>
>> Note: the scsi_transport_sas caller (the one that triggers the XFS
>> issue) already handles 0 safely. It passes the return value through
>> min_t() into shost->opt_sectors, which becomes 0; sd.c then feeds that
>> into min_not_zero() when computing io_opt, so a zero opt_sectors is
>> correctly treated as "no preference" and ignored.
>>
>> Based on linux-next (next-20260316).
>>
>> Ionut Nechita (2):
>> dma: return 0 from dma_opt_mapping_size() when no real hint exists
>> nvme-pci: handle dma_opt_mapping_size() returning 0
>>
>> drivers/nvme/host/pci.c | 15 ++++++++++-----
>> kernel/dma/mapping.c | 13 ++++++++-----
>> 2 files changed, 18 insertions(+), 10 deletions(-)
>>
>
>
--
Damien Le Moal
Western Digital Research
© 2016 - 2026 Red Hat, Inc.