io_uring/rsrc.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
the number of bvecs in the request. However, bvecs may be split into
multiple segments depending on the queue limits. Thus, the number of
segments may overestimate the number of bvecs. For ublk devices, the
only current users of io_buffer_register_bvec(), virt_boundary_mask,
seg_boundary_mask, max_segments, and max_segment_size can all be set
arbitrarily by the ublk server process.
Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
loop actually yields. However, continue using blk_rq_nr_phys_segments()
as an upper bound on the number of bvecs when allocating imu to avoid
needing to iterate the bvecs a second time.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
---
io_uring/rsrc.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index d787c16dc1c3..301c6899d240 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -941,12 +941,12 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
struct io_rsrc_data *data = &ctx->buf_table;
struct req_iterator rq_iter;
struct io_mapped_ubuf *imu;
struct io_rsrc_node *node;
- struct bio_vec bv, *bvec;
- u16 nr_bvecs;
+ struct bio_vec bv;
+ unsigned int nr_bvecs = 0;
int ret = 0;
io_ring_submit_lock(ctx, issue_flags);
if (index >= data->nr) {
ret = -EINVAL;
@@ -963,32 +963,34 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
if (!node) {
ret = -ENOMEM;
goto unlock;
}
- nr_bvecs = blk_rq_nr_phys_segments(rq);
- imu = io_alloc_imu(ctx, nr_bvecs);
+ /*
+ * blk_rq_nr_phys_segments() may overestimate the number of bvecs
+ * but avoids needing to iterate over the bvecs
+ */
+ imu = io_alloc_imu(ctx, blk_rq_nr_phys_segments(rq));
if (!imu) {
kfree(node);
ret = -ENOMEM;
goto unlock;
}
imu->ubuf = 0;
imu->len = blk_rq_bytes(rq);
imu->acct_pages = 0;
imu->folio_shift = PAGE_SHIFT;
- imu->nr_bvecs = nr_bvecs;
refcount_set(&imu->refs, 1);
imu->release = release;
imu->priv = rq;
imu->is_kbuf = true;
imu->dir = 1 << rq_data_dir(rq);
- bvec = imu->bvec;
rq_for_each_bvec(bv, rq, rq_iter)
- *bvec++ = bv;
+ imu->bvec[nr_bvecs++] = bv;
+ imu->nr_bvecs = nr_bvecs;
node->buf = imu;
data->nodes[index] = node;
unlock:
io_ring_submit_unlock(ctx, issue_flags);
--
2.45.2
On Tue, 11 Nov 2025 12:15:29 -0700, Caleb Sander Mateos wrote:
> io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> the number of bvecs in the request. However, bvecs may be split into
> multiple segments depending on the queue limits. Thus, the number of
> segments may overestimate the number of bvecs. For ublk devices, the
> only current users of io_buffer_register_bvec(), virt_boundary_mask,
> seg_boundary_mask, max_segments, and max_segment_size can all be set
> arbitrarily by the ublk server process.
> Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> loop actually yields. However, continue using blk_rq_nr_phys_segments()
> as an upper bound on the number of bvecs when allocating imu to avoid
> needing to iterate the bvecs a second time.
>
> [...]
Applied, thanks!
[1/1] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
commit: 2d0e88f3fd1dcb37072d499c36162baf5b009d41
Best regards,
--
Jens Axboe
On Tue, Nov 11, 2025 at 12:15:29PM -0700, Caleb Sander Mateos wrote:
> io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> the number of bvecs in the request. However, bvecs may be split into
> multiple segments depending on the queue limits. Thus, the number of
> segments may overestimate the number of bvecs. For ublk devices, the
> only current users of io_buffer_register_bvec(), virt_boundary_mask,
> seg_boundary_mask, max_segments, and max_segment_size can all be set
> arbitrarily by the ublk server process.
> Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> loop actually yields. However, continue using blk_rq_nr_phys_segments()
> as an upper bound on the number of bvecs when allocating imu to avoid
> needing to iterate the bvecs a second time.
>
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
BTW, this issue may not be a problem because ->nr_bvecs is only used in
iov_iter_bvec(), in which 'offset' and 'len' can control how far the
iterator can reach, so the uninitialized bvecs won't be touched basically.
Otherwise, the issue should have been triggered somewhere.
Also the bvec allocation may be avoided in case of single-bio request,
which can be one future optimization.
Thanks,
Ming
On Tue, Nov 11, 2025 at 5:01 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Nov 11, 2025 at 12:15:29PM -0700, Caleb Sander Mateos wrote:
> > io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> > the number of bvecs in the request. However, bvecs may be split into
> > multiple segments depending on the queue limits. Thus, the number of
> > segments may overestimate the number of bvecs. For ublk devices, the
> > only current users of io_buffer_register_bvec(), virt_boundary_mask,
> > seg_boundary_mask, max_segments, and max_segment_size can all be set
> > arbitrarily by the ublk server process.
> > Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> > loop actually yields. However, continue using blk_rq_nr_phys_segments()
> > as an upper bound on the number of bvecs when allocating imu to avoid
> > needing to iterate the bvecs a second time.
> >
> > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>
> BTW, this issue may not be a problem because ->nr_bvecs is only used in
> iov_iter_bvec(), in which 'offset' and 'len' can control how far the
> iterator can reach, so the uninitialized bvecs won't be touched basically.
I see your point, but what about iov_iter_extract_bvec_pages()? That
looks like it only uses i->nr_segs to bound the iteration, not
i->count. Hopefully there aren't any other helpers relying on nr_segs.
If you really don't think it's a problem, I'm fine deferring the patch
to 6.19. We haven't encountered any problems caused by this bug, but
we haven't tested with any non-default virt_boundary_mask,
seg_boundary_mask, max_segments, or max_segment_size on the ublk
device.
>
> Otherwise, the issue should have been triggered somewhere.
>
> Also the bvec allocation may be avoided in case of single-bio request,
> which can be one future optimization.
I'm not sure what you're suggesting. The bio_vec array is a flexible
array member of io_mapped_ubuf, so unless we add another pointer
indirection, I don't see how to reuse the bio's bi_io_vec array.
io_mapped_ubuf is also used for user registered buffers, where this
optimization isn't possible, so it may not be a clear win.
Best,
Caleb
On Tue, Nov 11, 2025 at 05:44:18PM -0800, Caleb Sander Mateos wrote:
> On Tue, Nov 11, 2025 at 5:01 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Nov 11, 2025 at 12:15:29PM -0700, Caleb Sander Mateos wrote:
> > > io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> > > the number of bvecs in the request. However, bvecs may be split into
> > > multiple segments depending on the queue limits. Thus, the number of
> > > segments may overestimate the number of bvecs. For ublk devices, the
> > > only current users of io_buffer_register_bvec(), virt_boundary_mask,
> > > seg_boundary_mask, max_segments, and max_segment_size can all be set
> > > arbitrarily by the ublk server process.
> > > Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> > > loop actually yields. However, continue using blk_rq_nr_phys_segments()
> > > as an upper bound on the number of bvecs when allocating imu to avoid
> > > needing to iterate the bvecs a second time.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
> >
> > Reviewed-by: Ming Lei <ming.lei@redhat.com>
> >
> > BTW, this issue may not be a problem because ->nr_bvecs is only used in
> > iov_iter_bvec(), in which 'offset' and 'len' can control how far the
> > iterator can reach, so the uninitialized bvecs won't be touched basically.
>
> I see your point, but what about iov_iter_extract_bvec_pages()? That
> looks like it only uses i->nr_segs to bound the iteration, not
> i->count. Hopefully there aren't any other helpers relying on nr_segs.
iov_iter_extract_bvec_pages() is only called from iov_iter_extract_pages(),
in which 'maxsize' is capped by i->count.
> If you really don't think it's a problem, I'm fine deferring the patch
> to 6.19. We haven't encountered any problems caused by this bug, but
> we haven't tested with any non-default virt_boundary_mask,
> seg_boundary_mask, max_segments, or max_segment_size on the ublk
> device.
IMO it should belong to v6.18: your fix not only makes code more robust, but
also it is correct thing to do.
I am just thinking why the issue wasn't triggered because we have lots of
test cases(rw verify, mkfs & mount ...)
>
> >
> > Otherwise, the issue should have been triggered somewhere.
> >
> > Also the bvec allocation may be avoided in case of single-bio request,
> > which can be one future optimization.
>
> I'm not sure what you're suggesting. The bio_vec array is a flexible
> array member of io_mapped_ubuf, so unless we add another pointer
> indirection, I don't see how to reuse the bio's bi_io_vec array.
> io_mapped_ubuf is also used for user registered buffers, where this
> optimization isn't possible, so it may not be a clear win.
io_mapped_ubuf->acct_pages can be one field reused for the indirect
pointer, please see lo_rw_aio() about how to reuse the bvec array.
Thanks,
Ming
On 11/11/25 11:15, Caleb Sander Mateos wrote:
> io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> the number of bvecs in the request. However, bvecs may be split into
> multiple segments depending on the queue limits. Thus, the number of
> segments may overestimate the number of bvecs. For ublk devices, the
> only current users of io_buffer_register_bvec(), virt_boundary_mask,
> seg_boundary_mask, max_segments, and max_segment_size can all be set
> arbitrarily by the ublk server process.
> Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> loop actually yields. However, continue using blk_rq_nr_phys_segments()
> as an upper bound on the number of bvecs when allocating imu to avoid
> needing to iterate the bvecs a second time.
>
> Signed-off-by: Caleb Sander Mateos<csander@purestorage.com>
> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
© 2016 - 2026 Red Hat, Inc.