From: Leon Romanovsky <leonro@nvidia.com>
This patch changes the length variables from unsigned int to size_t.
Using size_t ensures that we can handle larger sizes, as size_t is
always equal to or larger than the previously used u32 type.
Originally, u32 was used because blk-mq-dma code evolved from
scatter-gather implementation, which uses unsigned int to describe length.
This change will also allow us to reuse the existing struct phys_vec in places
that don't need scatter-gather.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
---
block/blk-mq-dma.c | 8 ++++++--
drivers/nvme/host/pci.c | 4 ++--
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index e9108ccaf4b0..e7d9b54c3eed 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -8,7 +8,7 @@
struct phys_vec {
phys_addr_t paddr;
- u32 len;
+ size_t len;
};
static bool __blk_map_iter_next(struct blk_map_iter *iter)
@@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
struct phys_vec *vec)
{
enum dma_data_direction dir = rq_dma_dir(req);
- unsigned int mapped = 0;
unsigned int attrs = 0;
+ size_t mapped = 0;
int error;
iter->addr = state->addr;
@@ -296,6 +296,8 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
blk_rq_map_iter_init(rq, &iter);
while (blk_map_iter_next(rq, &iter, &vec)) {
*last_sg = blk_next_sg(last_sg, sglist);
+
+ WARN_ON_ONCE(overflows_type(vec.len, unsigned int));
sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
offset_in_page(vec.paddr));
nsegs++;
@@ -416,6 +418,8 @@ int blk_rq_map_integrity_sg(struct request *rq, struct scatterlist *sglist)
while (blk_map_iter_next(rq, &iter, &vec)) {
sg = blk_next_sg(&sg, sglist);
+
+ WARN_ON_ONCE(overflows_type(vec.len, unsigned int));
sg_set_page(sg, phys_to_page(vec.paddr), vec.len,
offset_in_page(vec.paddr));
segments++;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e5ca8301bb8b..b61ec62b0ec6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -290,14 +290,14 @@ struct nvme_iod {
u8 flags;
u8 nr_descriptors;
- unsigned int total_len;
+ size_t total_len;
struct dma_iova_state dma_state;
void *descriptors[NVME_MAX_NR_DESCRIPTORS];
struct nvme_dma_vec *dma_vecs;
unsigned int nr_dma_vecs;
dma_addr_t meta_dma;
- unsigned int meta_total_len;
+ size_t meta_total_len;
struct dma_iova_state meta_dma_state;
struct nvme_sgl_desc *meta_descriptor;
};
--
2.51.1
On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> This patch changes the length variables from unsigned int to size_t.
> Using size_t ensures that we can handle larger sizes, as size_t is
> always equal to or larger than the previously used u32 type.
>
> Originally, u32 was used because blk-mq-dma code evolved from
> scatter-gather implementation, which uses unsigned int to describe length.
> This change will also allow us to reuse the existing struct phys_vec in places
> that don't need scatter-gather.
>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
> block/blk-mq-dma.c | 8 ++++++--
> drivers/nvme/host/pci.c | 4 ++--
> 2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> index e9108ccaf4b0..e7d9b54c3eed 100644
> --- a/block/blk-mq-dma.c
> +++ b/block/blk-mq-dma.c
> @@ -8,7 +8,7 @@
>
> struct phys_vec {
> phys_addr_t paddr;
> - u32 len;
> + size_t len;
> };
So we're now going to increase memory usage by 50% again after just
reducing it by removing the scatterlist?
On Tue, Nov 18, 2025 at 06:03:11AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > This patch changes the length variables from unsigned int to size_t.
> > Using size_t ensures that we can handle larger sizes, as size_t is
> > always equal to or larger than the previously used u32 type.
> >
> > Originally, u32 was used because blk-mq-dma code evolved from
> > scatter-gather implementation, which uses unsigned int to describe length.
> > This change will also allow us to reuse the existing struct phys_vec in places
> > that don't need scatter-gather.
> >
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> > ---
> > block/blk-mq-dma.c | 8 ++++++--
> > drivers/nvme/host/pci.c | 4 ++--
> > 2 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > index e9108ccaf4b0..e7d9b54c3eed 100644
> > --- a/block/blk-mq-dma.c
> > +++ b/block/blk-mq-dma.c
> > @@ -8,7 +8,7 @@
> >
> > struct phys_vec {
> > phys_addr_t paddr;
> > - u32 len;
> > + size_t len;
> > };
>
> So we're now going to increase memory usage by 50% again after just
> reducing it by removing the scatterlist?
It is slightly less.
Before this change: 96 bits
After this change (on 64bits system): 128 bits.
It is 33% increase per-structure.
So what is the resolution? Should I drop this patch or not?
Thanks
On Wed, 19 Nov 2025 11:55:16 +0200
Leon Romanovsky <leon@kernel.org> wrote:
> On Tue, Nov 18, 2025 at 06:03:11AM +0100, Christoph Hellwig wrote:
> > On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > This patch changes the length variables from unsigned int to size_t.
> > > Using size_t ensures that we can handle larger sizes, as size_t is
> > > always equal to or larger than the previously used u32 type.
> > >
> > > Originally, u32 was used because blk-mq-dma code evolved from
> > > scatter-gather implementation, which uses unsigned int to describe length.
> > > This change will also allow us to reuse the existing struct phys_vec in places
> > > that don't need scatter-gather.
> > >
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> > > ---
> > > block/blk-mq-dma.c | 8 ++++++--
> > > drivers/nvme/host/pci.c | 4 ++--
> > > 2 files changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > > index e9108ccaf4b0..e7d9b54c3eed 100644
> > > --- a/block/blk-mq-dma.c
> > > +++ b/block/blk-mq-dma.c
> > > @@ -8,7 +8,7 @@
> > >
> > > struct phys_vec {
> > > phys_addr_t paddr;
> > > - u32 len;
> > > + size_t len;
> > > };
> >
> > So we're now going to increase memory usage by 50% again after just
> > reducing it by removing the scatterlist?
>
> It is slightly less.
>
> Before this change: 96 bits
Did you actually look?
There will normally be 4 bytes of padding at the end of the structure.
About the only place where it will be 12 bytes is a 32bit system with
64bit phyaddr that aligns 64bit items on 32bit boundaries - so x86.
David
> After this change (on 64bits system): 128 bits.
>
> It is 33% increase per-structure.
>
> So what is the resolution? Should I drop this patch or not?
>
> Thanks
>
On Wed, Nov 19, 2025 at 01:36:15PM +0000, David Laight wrote:
> On Wed, 19 Nov 2025 11:55:16 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
>
> > On Tue, Nov 18, 2025 at 06:03:11AM +0100, Christoph Hellwig wrote:
> > > On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > >
> > > > This patch changes the length variables from unsigned int to size_t.
> > > > Using size_t ensures that we can handle larger sizes, as size_t is
> > > > always equal to or larger than the previously used u32 type.
> > > >
> > > > Originally, u32 was used because blk-mq-dma code evolved from
> > > > scatter-gather implementation, which uses unsigned int to describe length.
> > > > This change will also allow us to reuse the existing struct phys_vec in places
> > > > that don't need scatter-gather.
> > > >
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> > > > ---
> > > > block/blk-mq-dma.c | 8 ++++++--
> > > > drivers/nvme/host/pci.c | 4 ++--
> > > > 2 files changed, 8 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > > > index e9108ccaf4b0..e7d9b54c3eed 100644
> > > > --- a/block/blk-mq-dma.c
> > > > +++ b/block/blk-mq-dma.c
> > > > @@ -8,7 +8,7 @@
> > > >
> > > > struct phys_vec {
> > > > phys_addr_t paddr;
> > > > - u32 len;
> > > > + size_t len;
> > > > };
> > >
> > > So we're now going to increase memory usage by 50% again after just
> > > reducing it by removing the scatterlist?
> >
> > It is slightly less.
> >
> > Before this change: 96 bits
>
> Did you actually look?
No, I simply performed sizeof(phys_addr_t) + sizeof(size_t).
> There will normally be 4 bytes of padding at the end of the structure.
>
> About the only place where it will be 12 bytes is a 32bit system with
> 64bit phyaddr that aligns 64bit items on 32bit boundaries - so x86.
So does it mean that Christoph's comment about size increase is not correct?
Thanks
>
> David
>
> > After this change (on 64bits system): 128 bits.
> >
> > It is 33% increase per-structure.
> >
> > So what is the resolution? Should I drop this patch or not?
> >
> > Thanks
> >
>
On Wed, 19 Nov 2025 15:58:21 +0200
Leon Romanovsky <leon@kernel.org> wrote:
> On Wed, Nov 19, 2025 at 01:36:15PM +0000, David Laight wrote:
> > On Wed, 19 Nov 2025 11:55:16 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >
> > > On Tue, Nov 18, 2025 at 06:03:11AM +0100, Christoph Hellwig wrote:
> > > > On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > >
> > > > > This patch changes the length variables from unsigned int to size_t.
> > > > > Using size_t ensures that we can handle larger sizes, as size_t is
> > > > > always equal to or larger than the previously used u32 type.
> > > > >
> > > > > Originally, u32 was used because blk-mq-dma code evolved from
> > > > > scatter-gather implementation, which uses unsigned int to describe length.
> > > > > This change will also allow us to reuse the existing struct phys_vec in places
> > > > > that don't need scatter-gather.
> > > > >
> > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> > > > > ---
> > > > > block/blk-mq-dma.c | 8 ++++++--
> > > > > drivers/nvme/host/pci.c | 4 ++--
> > > > > 2 files changed, 8 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > > > > index e9108ccaf4b0..e7d9b54c3eed 100644
> > > > > --- a/block/blk-mq-dma.c
> > > > > +++ b/block/blk-mq-dma.c
> > > > > @@ -8,7 +8,7 @@
> > > > >
> > > > > struct phys_vec {
> > > > > phys_addr_t paddr;
> > > > > - u32 len;
> > > > > + size_t len;
> > > > > };
> > > >
> > > > So we're now going to increase memory usage by 50% again after just
> > > > reducing it by removing the scatterlist?
> > >
> > > It is slightly less.
> > >
> > > Before this change: 96 bits
> >
> > Did you actually look?
>
> No, I simply performed sizeof(phys_addr_t) + sizeof(size_t).
>
> > There will normally be 4 bytes of padding at the end of the structure.
> >
> > About the only place where it will be 12 bytes is a 32bit system with
> > 64bit phyaddr that aligns 64bit items on 32bit boundaries - so x86.
>
> So does it mean that Christoph's comment about size increase is not correct?
Correct - ie there is no size increase.
>
> Thanks
>
> >
> > David
> >
> > > After this change (on 64bits system): 128 bits.
> > >
> > > It is 33% increase per-structure.
> > >
> > > So what is the resolution? Should I drop this patch or not?
> > >
> > > Thanks
> > >
> >
On Wed, Nov 19, 2025 at 11:55:16AM +0200, Leon Romanovsky wrote: > So what is the resolution? Should I drop this patch or not? I think it should be dropped. I don't think having to use one 96-bit structure per 4GB worth of memory should be a deal breaker.
On Wed, Nov 19, 2025 at 11:10:48AM +0100, Christoph Hellwig wrote: > On Wed, Nov 19, 2025 at 11:55:16AM +0200, Leon Romanovsky wrote: > > So what is the resolution? Should I drop this patch or not? > > I think it should be dropped. I don't think having to use one 96-bit > structure per 4GB worth of memory should be a deal breaker. Christoph, It seems like no size change will before and after my change. https://lore.kernel.org/linux-nvme/20251117-nvme-phys-types-v2-0-c75a60a2c468@nvidia.com/T/#ma575c050517e91e7630683cf193e39d812338fa4 Thanks
On Wed, Nov 19, 2025 at 11:10:48AM +0100, Christoph Hellwig wrote: > On Wed, Nov 19, 2025 at 11:55:16AM +0200, Leon Romanovsky wrote: > > So what is the resolution? Should I drop this patch or not? > > I think it should be dropped. I don't think having to use one 96-bit > structure per 4GB worth of memory should be a deal breaker. No problem, will drop.
On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index e5ca8301bb8b..b61ec62b0ec6 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -290,14 +290,14 @@ struct nvme_iod {
> u8 flags;
> u8 nr_descriptors;
>
> - unsigned int total_len;
> + size_t total_len;
Changing the generic phys_vec sounds fine, but the nvme driver has a 8MB
limitation on how large an IO can be, so I don't think the driver's
length needs to match the phys_vec type.
On Mon, Nov 17, 2025 at 12:35:40PM -0700, Keith Busch wrote: > > + size_t total_len; > > Changing the generic phys_vec sounds fine, but the nvme driver has a 8MB > limitation on how large an IO can be, so I don't think the driver's > length needs to match the phys_vec type. With the new dma mapping interface we could lift that limits for SGL-based controllers as we basically only have a nr_segments limit now. Not that I'm trying to argue for multi-GB I/O..
On Tue, Nov 18, 2025 at 06:18:23AM +0100, Christoph Hellwig wrote: > On Mon, Nov 17, 2025 at 12:35:40PM -0700, Keith Busch wrote: > > > + size_t total_len; > > > > Changing the generic phys_vec sounds fine, but the nvme driver has a 8MB > > limitation on how large an IO can be, so I don't think the driver's > > length needs to match the phys_vec type. > > With the new dma mapping interface we could lift that limits for > SGL-based controllers as we basically only have a nr_segments limit now. > Not that I'm trying to argue for multi-GB I/O.. It's not a bad idea. The tricky part is in the timeout handling. If we allow very large IO, I think we need a dynamic timeout value to account for the link's throughput. We can already trigger blk-mq timeouts if you saturate enough queues with max sized IO, despite everything else working-as-designed.
On Mon, Nov 17, 2025 at 12:35:40PM -0700, Keith Busch wrote:
> On Mon, Nov 17, 2025 at 09:22:43PM +0200, Leon Romanovsky wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index e5ca8301bb8b..b61ec62b0ec6 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -290,14 +290,14 @@ struct nvme_iod {
> > u8 flags;
> > u8 nr_descriptors;
> >
> > - unsigned int total_len;
> > + size_t total_len;
>
> Changing the generic phys_vec sounds fine, but the nvme driver has a 8MB
> limitation on how large an IO can be, so I don't think the driver's
> length needs to match the phys_vec type.
I'm big fan of keeping same types in all places, but can drop nvme changes,
if you think that it is right thing to do.
Thanks
© 2016 - 2025 Red Hat, Inc.