[PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block

ChangLimin posted 1 patch 3 years, 1 month ago
Failed in applying to current master (apply log)
block/file-posix.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
[PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block
Posted by ChangLimin 3 years, 1 month ago
For Linux 5.10/5.11, qemu write zeros to a multipath device using
ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
permanently. Fallback to pwritev instead of exit for -EBUSY error.

The issue was introduced in Linux 5.10:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02

Fixed in Linux 5.12:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d

Signed-off-by: ChangLimin <changlm@chinatelecom.cn>
---
 block/file-posix.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 20e14f8e96..d4054ac9cb 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1624,8 +1624,12 @@ static ssize_t handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
         } while (errno == EINTR);

         ret = translate_err(-errno);
-        if (ret == -ENOTSUP) {
-            s->has_write_zeroes = false;
+        switch (ret) {
+        case -ENOTSUP:
+            s->has_write_zeroes = false; /* fall through */
+        case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath devices */
+            return -ENOTSUP;
+            break;
         }
     }
 #endif
--
2.27.0

Re: [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block
Posted by John Snow 3 years, 1 month ago
On 3/22/21 5:25 AM, ChangLimin wrote:
> For Linux 5.10/5.11, qemu write zeros to a multipath device using
> ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
> permanently. Fallback to pwritev instead of exit for -EBUSY error.
> 
> The issue was introduced in Linux 5.10:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02
> 
> Fixed in Linux 5.12:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d
> 
> Signed-off-by: ChangLimin <changlm@chinatelecom.cn>

To be clear, when I asked "When do we get -EINVAL?" it wasn't because I 
doubted that we would ever get it, I was just unclear of the 
circumstances in which we might receive EINVAL and was hoping you would 
explain it to me.

> ---
>   block/file-posix.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 20e14f8e96..d4054ac9cb 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1624,8 +1624,12 @@ static ssize_t 
> handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
>           } while (errno == EINTR);
> 
>           ret = translate_err(-errno);
> -        if (ret == -ENOTSUP) {
> -            s->has_write_zeroes = false;
> +        switch (ret) {
> +        case -ENOTSUP:
> +            s->has_write_zeroes = false; /* fall through */
> +        case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath 
> devices */
> +            return -ENOTSUP;
> +            break;

What effect does this have, now?

We'll return ENOTSUP but we won't disable trying it again in the future, 
is that right?

Kevin, is this what you had in mind?

--js

>           }
>       }
>   #endif
> --
> 2.27.0
> 


Re: [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block
Posted by Max Reitz 3 years, 1 month ago
On 22.03.21 10:25, ChangLimin wrote:
> For Linux 5.10/5.11, qemu write zeros to a multipath device using
> ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
> permanently.

So as far as I can track back the discussion, Kevin asked on v1 why we’d 
set has_write_zeroes to false, i.e. whether the EBUSY might not go away 
at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then.
You haven’t explicitly replied to that question (as far as I can see), 
so it kind of still stands.

Implicitly, there are two conflicting answers in this patch: On one 
hand, the commit message says “permanently”, and this is what you told 
Nir as a realistic case where this can occur.  So that to me implies 
that we actually should not retry BLKZEROOUT, because the EBUSY will 
remain, and that condition won’t change while the block device is in use 
by qemu.

On the other hand, in the code, you have decided not to reset 
has_write_zeroes to false, so the implementation will retry.

So I don’t quite understand.  Should we keep trying BLKZEROOUT or is 
there no chance of it working after it has at one point failed with 
EBUSY?  (Are there other cases besides what’s described in this commit 
message where EBUSY might be returned and it is only temporary?)

> Fallback to pwritev instead of exit for -EBUSY error.
> 
> The issue was introduced in Linux 5.10:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02
> 
> Fixed in Linux 5.12:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d
> 
> Signed-off-by: ChangLimin <changlm@chinatelecom.cn>
> ---
>   block/file-posix.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 20e14f8e96..d4054ac9cb 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1624,8 +1624,12 @@ static ssize_t 
> handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
>           } while (errno == EINTR);
> 
>           ret = translate_err(-errno);
> -        if (ret == -ENOTSUP) {
> -            s->has_write_zeroes = false;
> +        switch (ret) {
> +        case -ENOTSUP:
> +            s->has_write_zeroes = false; /* fall through */
> +        case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath 
> devices */
> +            return -ENOTSUP;
> +            break;

(Not sure why this break is here.)

Max

>           }
>       }
>   #endif
> --
> 2.27.0
> 


Re: [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block
Posted by Nir Soffer 3 years, 1 month ago
On Wed, Mar 24, 2021 at 4:52 PM Max Reitz <mreitz@redhat.com> wrote:

> On 22.03.21 10:25, ChangLimin wrote:
> > For Linux 5.10/5.11, qemu write zeros to a multipath device using
> > ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
> > permanently.
>
> So as far as I can track back the discussion, Kevin asked on v1 why we’d
> set has_write_zeroes to false, i.e. whether the EBUSY might not go away
> at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then.
> You haven’t explicitly replied to that question (as far as I can see),
> so it kind of still stands.
>
> Implicitly, there are two conflicting answers in this patch: On one
> hand, the commit message says “permanently”, and this is what you told
> Nir as a realistic case where this can occur.


I'm afraid ChangLimin did not answer my question. I'm looking for real
world used case when qemu cannot write zeros to multipath device, when
nobody else is using the device.

I tried to reproduce this on Fedora (kernel 5.10) with qemu-img convert,
once with a multipath device, and once with logical volume on a vg created
on the multipath device, and I could not reproduce this issue.

If I understand the kernel change correctly, this can happen when there is
a mounted file system on top of the multipath device. I don't think we have
a use case when qemu accesses a multipath device when the device is used
by a file system, but maybe I missed something.


> So that to me implies
> that we actually should not retry BLKZEROOUT, because the EBUSY will
> remain, and that condition won’t change while the block device is in use
> by qemu.
>
> On the other hand, in the code, you have decided not to reset
> has_write_zeroes to false, so the implementation will retry.
>

EBUSY is usually a temporary error, so retrying makes sense. The question
is if we really can write zeroes manually in this case?


> So I don’t quite understand.  Should we keep trying BLKZEROOUT or is
> there no chance of it working after it has at one point failed with
> EBUSY?  (Are there other cases besides what’s described in this commit
> message where EBUSY might be returned and it is only temporary?)
>
> > Fallback to pwritev instead of exit for -EBUSY error.
> >
> > The issue was introduced in Linux 5.10:
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02
> >
> > Fixed in Linux 5.12:
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d
> >
> > Signed-off-by: ChangLimin <changlm@chinatelecom.cn>
> > ---
> >   block/file-posix.c | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 20e14f8e96..d4054ac9cb 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1624,8 +1624,12 @@ static ssize_t
> > handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
> >           } while (errno == EINTR);
> >
> >           ret = translate_err(-errno);
> > -        if (ret == -ENOTSUP) {
> > -            s->has_write_zeroes = false;
> > +        switch (ret) {
> > +        case -ENOTSUP:
> > +            s->has_write_zeroes = false; /* fall through */
> > +        case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath
> > devices */
> > +            return -ENOTSUP;
> > +            break;
>
> (Not sure why this break is here.)
>
> Max
>
> >           }
> >       }
> >   #endif
> > --
> > 2.27.0
> >
>
>
>