[PATCH 0/3] hw/nvme: lift IOV_MAX limit in DMA path

Daniel Gomez posted 3 patches 1 day, 4 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260529-align-nvme-mdts-with-linux-v1-0-221d4d21ab43@samsung.com
Maintainers: Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Jesper Devantier <foss@defmacro.it>, Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, "Philippe Mathieu-Daudé" <philmd@mailo.com>
hw/nvme/ctrl.c       | 15 +++++++++------
system/dma-helpers.c |  3 ++-
2 files changed, 11 insertions(+), 7 deletions(-)
[PATCH 0/3] hw/nvme: lift IOV_MAX limit in DMA path
Posted by Daniel Gomez 1 day, 4 hours ago
Raise the QEMU NVMe controller's MDTS limit from 9 (2 MiB) to 11 (8 MiB)
to match the Linux host driver's NVME_MAX_BYTES.

Commit 53493c1f83 ("hw/nvme: cap MDTS value for internal limitation")
needed the 2 MiB cap because dma_blk_io() submitted the full sglist
in one preadv()/pwritev() call, which the host kernel rejects when the
iovec count exceeds IOV_MAX. This series moves the IOV_MAX bound down to
dma_blk_cb(), where we batch in IOV_MAX chunks when necessary.

dma_blk_cb() now breaks its accumulation at IOV_MAX and submits the
next chunk via the existing re-entry path. With that the DMA-path
nvme_map_addr() guard is removed, MDTS moves to 11 via a new
NVME_MDTS_MAX, and mdts=0 is coerced to the same limit: Linux's host
driver clamps at NVME_MAX_BYTES = 8 MiB regardless of MDTS, so honoring
"unlimited" buys nothing.

Verification with blkalgn and fio:

fio \
    --name=mdts-stress \
    --filename=/mnt/mdts/stress.fio \
    --rw=randwrite \
    --bs=8M \
    --ioengine=psync \
    --direct=0 \
    --numjobs=8 \
    --time_based \
    --runtime=600 \
    --fsync=8 \
    --end_fsync=1 \
    --group_reporting \
    --refill_buffers \
    --norandommap \
    --offset_align=32k

blkalgn-libbpf --disk=nvme3n1 --trace

I/O Granularity Histogram for Device nvme3n1 (lbads: 12 - 4096 bytes)
Total I/Os: 134993
     Bytes         : count    distribution
        0          : 12378    |****                                    |
        32768      : 3622     |*                                       |
        65536      : 2        |                                        |
        1048576    : 71       |                                        |
        2097152    : 827      |                                        |
        2588672    : 50       |                                        |
        3670016    : 62       |                                        |
        4194304    : 357      |                                        |
        4718592    : 62       |                                        |
        5799936    : 50       |                                        |
        6291456    : 644      |                                        |
        7340032    : 71       |                                        |
        8388608    : 116797   |****************************************|

I/O Alignment Histogram for Device nvme3n1
     Bytes               : count    distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 12378    |**********                              |
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 0        |                                        |
      4096 -> 8191       : 0        |                                        |
      8192 -> 16383      : 0        |                                        |
     16384 -> 32767      : 0        |                                        |
     32768 -> 65535      : 49360    |****************************************|
     65536 -> 131071     : 2        |                                        |
    131072 -> 262143     : 0        |                                        |
    262144 -> 524287     : 0        |                                        |
    524288 -> 1048575    : 36747    |*****************************           |
   1048576 -> 2097151    : 36506    |*****************************           |

/sys/block/nvme3n1/queue/
max_segments           = 256
max_segment_size       = 4294967295
max_hw_sectors_kb      = 8192
max_sectors_kb         = 8192
virt_boundary_mask     = 0
logical_block_size     = 4096
physical_block_size    = 4096
minimum_io_size        = 4096
optimal_io_size        = 0
dma_alignment          = 3
chunk_sectors          = 0
nr_requests            = 1023

nvme id-ctrl /dev/nvme3n1
mdts      : 11
cntrltype : 1
sgls      : 0x80001

xfs_info /mnt/mdts
meta-data=/dev/nvme3n1           isize=512    agcount=4, agsize=163840 blks
         =                       sectsz=32768 attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=0   metadir=0
data     =                       bsize=32768  blocks=655360, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=32768  ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=32768  blocks=5446, version=2
         =                       sectsz=32768 sunit=1 blks, lazy-count=1
realtime =none                   extsz=32768  blocks=0, rtextents=0
         =                       rgcount=0    rgsize=0 extents
         =                       zoned=0      start=0 reserved=0

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
Daniel Gomez (3):
      dma-helpers: chunk dma_blk_cb at IOV_MAX
      hw/nvme: drop DMA-path IOV_MAX guard
      hw/nvme: allow mdts up to 8192 KiB

 hw/nvme/ctrl.c       | 15 +++++++++------
 system/dma-helpers.c |  3 ++-
 2 files changed, 11 insertions(+), 7 deletions(-)
---
base-commit: 98b060da3a4f92b2a994ead5b16a87e783baf77c
change-id: 20260528-align-nvme-mdts-with-linux-67d618f6730b

Best regards,
--  
Daniel Gomez <da.gomez@samsung.com>