From: Shida Zhang <zhangshida@kylinos.cn>
Andreas point out that multiple completions can race setting
bi_status.
The check (parent->bi_status) and the subsequent write are not an
atomic operation. The value of parent->bi_status could have changed
between the time you read it for the if check and the time you write
to it. So we use cmpxchg to fix the race, as suggested by Christoph.
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
---
block/bio.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 1b5e4577f4c..097c1cd2054 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -314,8 +314,9 @@ static struct bio *__bio_chain_endio(struct bio *bio)
{
struct bio *parent = bio->bi_private;
- if (bio->bi_status && !parent->bi_status)
- parent->bi_status = bio->bi_status;
+ if (bio->bi_status)
+ cmpxchg(&parent->bi_status, 0, bio->bi_status);
+
bio_put(bio);
return parent;
}
--
2.34.1
On Mon, Dec 1, 2025 at 10:05 AM zhangshida <starzhangzsd@gmail.com> wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
>
> Andreas point out that multiple completions can race setting
> bi_status.
>
> The check (parent->bi_status) and the subsequent write are not an
> atomic operation. The value of parent->bi_status could have changed
> between the time you read it for the if check and the time you write
> to it. So we use cmpxchg to fix the race, as suggested by Christoph.
>
> Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
> ---
> block/bio.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 1b5e4577f4c..097c1cd2054 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -314,8 +314,9 @@ static struct bio *__bio_chain_endio(struct bio *bio)
> {
> struct bio *parent = bio->bi_private;
>
> - if (bio->bi_status && !parent->bi_status)
> - parent->bi_status = bio->bi_status;
> + if (bio->bi_status)
> + cmpxchg(&parent->bi_status, 0, bio->bi_status);
Hmm. I don't think cmpxchg() actually is of any value here: for all
the chained bios, bi_status is initialized to 0, and it is only set
again (to a non-0 value) when a failure occurs. When there are
multiple failures, we only need to make sure that one of those
failures is eventually reported, but for that, a simple assignment is
enough here. The cmpxchg() won't guarantee that a specific error value
will survive; it all still depends on the timing. The cmpxchg() only
makes it look like something special is happening here with respect to
ordering.
> +
> bio_put(bio);
> return parent;
> }
> --
> 2.34.1
>
Thanks,
Andreas
On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > - if (bio->bi_status && !parent->bi_status) > > - parent->bi_status = bio->bi_status; > > + if (bio->bi_status) > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > Hmm. I don't think cmpxchg() actually is of any value here: for all > the chained bios, bi_status is initialized to 0, and it is only set > again (to a non-0 value) when a failure occurs. When there are > multiple failures, we only need to make sure that one of those > failures is eventually reported, but for that, a simple assignment is > enough here. A simple assignment doesn't guarantee atomicy. It also overrides earlier with later status codes, which might not be desirable.
On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > - if (bio->bi_status && !parent->bi_status) > > > - parent->bi_status = bio->bi_status; > > > + if (bio->bi_status) > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > the chained bios, bi_status is initialized to 0, and it is only set > > again (to a non-0 value) when a failure occurs. When there are > > multiple failures, we only need to make sure that one of those > > failures is eventually reported, but for that, a simple assignment is > > enough here. > > A simple assignment doesn't guarantee atomicy. Well, we've already discussed that bi_status is a single byte and so tearing won't be an issue. Otherwise, WRITE_ONCE() would still be enough here. > It also overrides earlier with later status codes, which might not be desirable. In an A -> B bio chain, we have two competing bi_status writers: (1) when the A fails, B->bi_status will be updated using cmpxchg(), (2) when B fails, bi_status will be assigned a non-0 value. In that scenario, B failures will always win over A failures. In addition, when we have an A -> B -> C bio chain, we still won't get "first failure wins" semantics because a failure of A will only be propagated to C once B completes as well. To "fix" that, we'd have to "chain" all bios to the same parent instead. But I don't think any of that is really needed. Andreas
© 2016 - 2025 Red Hat, Inc.