block/file-posix.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
For Linux 5.10/5.11, qemu write zeros to a multipath device using
ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
permanently. Fallback to pwritev instead of exit for -EBUSY error.
The issue was introduced in Linux 5.10:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02
Fixed in Linux 5.12:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d
Signed-off-by: ChangLimin <changlm@chinatelecom.cn>
---
block/file-posix.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 20e14f8e96..d4054ac9cb 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1624,8 +1624,12 @@ static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
} while (errno == EINTR);
ret = translate_err(-errno);
- if (ret == -ENOTSUP) {
- s->has_write_zeroes = false;
+ switch (ret) {
+ case -ENOTSUP:
+ s->has_write_zeroes = false; /* fall through */
+ case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath devices */
+ return -ENOTSUP;
+ break;
}
}
#endif
--
2.27.0
On 3/22/21 5:25 AM, ChangLimin wrote: > For Linux 5.10/5.11, qemu write zeros to a multipath device using > ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY > permanently. Fallback to pwritev instead of exit for -EBUSY error. > > The issue was introduced in Linux 5.10: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02 > > Fixed in Linux 5.12: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d > > Signed-off-by: ChangLimin <changlm@chinatelecom.cn> To be clear, when I asked "When do we get -EINVAL?" it wasn't because I doubted that we would ever get it, I was just unclear of the circumstances in which we might receive EINVAL and was hoping you would explain it to me. > --- > block/file-posix.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index 20e14f8e96..d4054ac9cb 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1624,8 +1624,12 @@ static ssize_t > handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb) > } while (errno == EINTR); > > ret = translate_err(-errno); > - if (ret == -ENOTSUP) { > - s->has_write_zeroes = false; > + switch (ret) { > + case -ENOTSUP: > + s->has_write_zeroes = false; /* fall through */ > + case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath > devices */ > + return -ENOTSUP; > + break; What effect does this have, now? We'll return ENOTSUP but we won't disable trying it again in the future, is that right? Kevin, is this what you had in mind? --js > } > } > #endif > -- > 2.27.0 >
On 22.03.21 10:25, ChangLimin wrote: > For Linux 5.10/5.11, qemu write zeros to a multipath device using > ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY > permanently. So as far as I can track back the discussion, Kevin asked on v1 why we’d set has_write_zeroes to false, i.e. whether the EBUSY might not go away at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then. You haven’t explicitly replied to that question (as far as I can see), so it kind of still stands. Implicitly, there are two conflicting answers in this patch: On one hand, the commit message says “permanently”, and this is what you told Nir as a realistic case where this can occur. So that to me implies that we actually should not retry BLKZEROOUT, because the EBUSY will remain, and that condition won’t change while the block device is in use by qemu. On the other hand, in the code, you have decided not to reset has_write_zeroes to false, so the implementation will retry. So I don’t quite understand. Should we keep trying BLKZEROOUT or is there no chance of it working after it has at one point failed with EBUSY? (Are there other cases besides what’s described in this commit message where EBUSY might be returned and it is only temporary?) > Fallback to pwritev instead of exit for -EBUSY error. > > The issue was introduced in Linux 5.10: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02 > > Fixed in Linux 5.12: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d > > Signed-off-by: ChangLimin <changlm@chinatelecom.cn> > --- > block/file-posix.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index 20e14f8e96..d4054ac9cb 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1624,8 +1624,12 @@ static ssize_t > handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb) > } while (errno == EINTR); > > ret = translate_err(-errno); > - if (ret == -ENOTSUP) { > - s->has_write_zeroes = false; > + switch (ret) { > + case -ENOTSUP: > + s->has_write_zeroes = false; /* fall through */ > + case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath > devices */ > + return -ENOTSUP; > + break; (Not sure why this break is here.) Max > } > } > #endif > -- > 2.27.0 >
On Wed, Mar 24, 2021 at 4:52 PM Max Reitz <mreitz@redhat.com> wrote: > On 22.03.21 10:25, ChangLimin wrote: > > For Linux 5.10/5.11, qemu write zeros to a multipath device using > > ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY > > permanently. > > So as far as I can track back the discussion, Kevin asked on v1 why we’d > set has_write_zeroes to false, i.e. whether the EBUSY might not go away > at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then. > You haven’t explicitly replied to that question (as far as I can see), > so it kind of still stands. > > Implicitly, there are two conflicting answers in this patch: On one > hand, the commit message says “permanently”, and this is what you told > Nir as a realistic case where this can occur. I'm afraid ChangLimin did not answer my question. I'm looking for real world used case when qemu cannot write zeros to multipath device, when nobody else is using the device. I tried to reproduce this on Fedora (kernel 5.10) with qemu-img convert, once with a multipath device, and once with logical volume on a vg created on the multipath device, and I could not reproduce this issue. If I understand the kernel change correctly, this can happen when there is a mounted file system on top of the multipath device. I don't think we have a use case when qemu accesses a multipath device when the device is used by a file system, but maybe I missed something. > So that to me implies > that we actually should not retry BLKZEROOUT, because the EBUSY will > remain, and that condition won’t change while the block device is in use > by qemu. > > On the other hand, in the code, you have decided not to reset > has_write_zeroes to false, so the implementation will retry. > EBUSY is usually a temporary error, so retrying makes sense. The question is if we really can write zeroes manually in this case? > So I don’t quite understand. Should we keep trying BLKZEROOUT or is > there no chance of it working after it has at one point failed with > EBUSY? (Are there other cases besides what’s described in this commit > message where EBUSY might be returned and it is only temporary?) > > > Fallback to pwritev instead of exit for -EBUSY error. > > > > The issue was introduced in Linux 5.10: > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02 > > > > Fixed in Linux 5.12: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d > > > > Signed-off-by: ChangLimin <changlm@chinatelecom.cn> > > --- > > block/file-posix.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 20e14f8e96..d4054ac9cb 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -1624,8 +1624,12 @@ static ssize_t > > handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb) > > } while (errno == EINTR); > > > > ret = translate_err(-errno); > > - if (ret == -ENOTSUP) { > > - s->has_write_zeroes = false; > > + switch (ret) { > > + case -ENOTSUP: > > + s->has_write_zeroes = false; /* fall through */ > > + case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath > > devices */ > > + return -ENOTSUP; > > + break; > > (Not sure why this break is here.) > > Max > > > } > > } > > #endif > > -- > > 2.27.0 > > > > >
© 2016 - 2024 Red Hat, Inc.