block/bio.c | 1 + 1 file changed, 1 insertion(+)
When cloning a bio, the `bio.bi_vcnt` field is not cloned. This is a
problem if users want to perform bounds checks on the `bio.bi_io_vec`
field.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
block/bio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/bio.c b/block/bio.c
index f0c416e5931d9..334eedf312803 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -870,6 +870,7 @@ struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src,
return NULL;
}
bio->bi_io_vec = bio_src->bi_io_vec;
+ bio->bi_vcnt = bio_src->bi_vcnt;
return bio;
}
---
base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
change-id: 20250215-clone-bi_vcnt-f3f770988894
Best regards,
--
Andreas Hindborg <a.hindborg@kernel.org>
On Sat, Feb 15, 2025 at 11:58:15AM +0100, Andreas Hindborg wrote: > When cloning a bio, the `bio.bi_vcnt` field is not cloned. This is a > problem if users want to perform bounds checks on the `bio.bi_io_vec` > field. Right now bi_vcnt is supposed to be an implementation detail for bio_add_*, which obviously can't be called on cloned bio. Except for the usual abuse in bcache/bcachefs that has mostly kept up except for a few read-only checks in the completion routines which also can't be called on cloned bios. It would be nice to use it as a __counted_by bound for bi_io_vec, but until that is supported on pointers in addition to the flexible arrays we can't actually do that. So as-is I don't really see a point in just assigning the value if we don't actually use it.
On 15/02/2025 10:58, Andreas Hindborg wrote: > When cloning a bio, the `bio.bi_vcnt` field is not cloned. This is a > problem if users want to perform bounds checks on the `bio.bi_io_vec` > field. Is this fixing a potential problem? Or fixing a real issue? Thanks, John > > Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> > --- > block/bio.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/block/bio.c b/block/bio.c > index f0c416e5931d9..334eedf312803 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -870,6 +870,7 @@ struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, > return NULL; > } > bio->bi_io_vec = bio_src->bi_io_vec; > + bio->bi_vcnt = bio_src->bi_vcnt; > > return bio; > } > > --- > base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3 > change-id: 20250215-clone-bi_vcnt-f3f770988894 > > Best regards,
"John Garry" <john.g.garry@oracle.com> writes: > On 15/02/2025 10:58, Andreas Hindborg wrote: >> When cloning a bio, the `bio.bi_vcnt` field is not cloned. This is a >> problem if users want to perform bounds checks on the `bio.bi_io_vec` >> field. > > Is this fixing a potential problem? Or fixing a real issue? It is fixing a problem I ran into in rnull, the rust null block implementation. When running with debug assertions enabled, a bound check on `bi_io_vec` fails for split bio, because `bio_vcnt` becomes zero in the cloned bio. I can work around this by not using a slice type to represent `bi_io_vec` in rust, not a big deal. But I am genuinely curious if there is a reason for not setting `bi_vcnt` during a clone. As far as I can tell, it should be safe to set. `bi_vcnt` being zero does not seem to have any effect other than to puzzle developers debugging the code. Maybe I missed something? Best regards, Andreas Hindborg
On 18/02/2025 11:40, Andreas Hindborg wrote: > "John Garry" <john.g.garry@oracle.com> writes: > >> On 15/02/2025 10:58, Andreas Hindborg wrote: >>> When cloning a bio, the `bio.bi_vcnt` field is not cloned. This is a >>> problem if users want to perform bounds checks on the `bio.bi_io_vec` >>> field. >> >> Is this fixing a potential problem? Or fixing a real issue? > > It is fixing a problem I ran into in rnull, the rust null block > implementation. When running with debug assertions enabled, a bound > check on `bi_io_vec` fails for split bio, because `bio_vcnt` becomes > zero in the cloned bio. > > I can work around this by not using a slice type to represent > `bi_io_vec` in rust, not a big deal. > > But I am genuinely curious if there is a reason for not setting > `bi_vcnt` during a clone. I think that it came from commit 59d276fe0 (with the addition of bio_clone_fast()), where we assume that the cloned bio is not having the bio_vec touched and so does not need to know bi_vcnt (or bi_max_vecs). And it is inefficient to needlessly set bi_vcnt then. > As far as I can tell, it should be safe to > set. `bi_vcnt` being zero does not seem to have any effect other than to > puzzle developers debugging the code. > > Maybe I missed something? > > Thanks, John
On 2/18/25 9:12 AM, John Garry wrote: > On 18/02/2025 11:40, Andreas Hindborg wrote: >> But I am genuinely curious if there is a reason for not setting >> `bi_vcnt` during a clone. > > I think that it came from commit 59d276fe0 (with the addition of > bio_clone_fast()), where we assume that the cloned bio is not having the > bio_vec touched and so does not need to know bi_vcnt (or bi_max_vecs). > And it is inefficient to needlessly set bi_vcnt then. Hmm ... I prefer paying the very small performance hit caused by copying bi_vcnt rather than having to deal with the inconsistency caused by not copying that data structure member. Thanks, Bart.
On 18/02/2025 22:21, Bart Van Assche wrote: > On 2/18/25 9:12 AM, John Garry wrote: >> On 18/02/2025 11:40, Andreas Hindborg wrote: >>> But I am genuinely curious if there is a reason for not setting >>> `bi_vcnt` during a clone. >> >> I think that it came from commit 59d276fe0 (with the addition of >> bio_clone_fast()), where we assume that the cloned bio is not having >> the bio_vec touched and so does not need to know bi_vcnt (or >> bi_max_vecs). And it is inefficient to needlessly set bi_vcnt then. > > Hmm ... I prefer paying the very small performance hit caused by copying > bi_vcnt rather than having to deal with the inconsistency caused by not > copying that data structure member. From my experience, setting anything which is not strictly necessary in the fastpath code is generally not wanted. Thanks, John
"John Garry" <john.g.garry@oracle.com> writes:
> On 18/02/2025 11:40, Andreas Hindborg wrote:
>> "John Garry" <john.g.garry@oracle.com> writes:
>>
>>> On 15/02/2025 10:58, Andreas Hindborg wrote:
>>>> When cloning a bio, the `bio.bi_vcnt` field is not cloned. This is a
>>>> problem if users want to perform bounds checks on the `bio.bi_io_vec`
>>>> field.
>>>
>>> Is this fixing a potential problem? Or fixing a real issue?
>>
>> It is fixing a problem I ran into in rnull, the rust null block
>> implementation. When running with debug assertions enabled, a bound
>> check on `bi_io_vec` fails for split bio, because `bio_vcnt` becomes
>> zero in the cloned bio.
>>
>> I can work around this by not using a slice type to represent
>> `bi_io_vec` in rust, not a big deal.
>>
>> But I am genuinely curious if there is a reason for not setting
>> `bi_vcnt` during a clone.
>
> I think that it came from commit 59d276fe0 (with the addition of
> bio_clone_fast()), where we assume that the cloned bio is not having the
> bio_vec touched and so does not need to know bi_vcnt (or bi_max_vecs).
> And it is inefficient to needlessly set bi_vcnt then.
I see. That is a few days ago. I am quite confident that for modern
hardware and workloads, this assignment will not have any measurable
impact on performance.
Can we add it back?
I understand if you would prefer not to, since it is not strictly
necessary. But in that case, I would suggest patching the documentation
of `struct bio` something like this:
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -255,7 +255,8 @@ struct bio {
struct bio_integrity_payload *bi_integrity; /* data integrity */
#endif
- unsigned short bi_vcnt; /* how many bio_vec's */
+ unsigned short bi_vcnt; /* how many bio_vec's. Not valid if this bio is
+ a clone (flagged BIO_CLONED). */
/*
* Everything starting with bi_max_vecs will be preserved by bio_reset()
Best regards,
Andreas Hindborg
© 2016 - 2025 Red Hat, Inc.