[v1] io_uring: fix short read slow path corruptions

[PATCH] io_uring: fix short read slow path corruptions

Posted by Dominique Martinet 3 years, 7 months ago

sqeq.off here is the offset to read within the disk image, so obviously
not 'nread' (the amount we just read), but as the author meant to write
its current value incremented by the amount we just read.

Normally recent versions of linux will not issue short reads,
but apparently btrfs with O_DIRECT (cache=none) does.

This lead to weird image corruptions when short read happened

Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---

I just spent a couple of days on this bug, will follow up with kernel to
see if we can also not get rid of the short read but perhaps a warning
should be added the first time we get a short read, as it's not supposed
to happen?
Well, slow path now seems to work (at least my VM now boots fine), but
if the code clearly states it should never be used I assume there might
be other bugs laying there as it's not tested... That this one was easy
enough to spot once I noticed the short reads was its only grace...

Thanks!

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..d58aff9615ce 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-- 
2.35.1

[PATCH v2] io_uring: fix short read slow path

Posted by Dominique Martinet 3 years, 7 months ago

sqeq.off here is the offset to read within the disk image, so obviously
not 'nread' (the amount we just read), but as the author meant to write
its current value incremented by the amount we just read.

Normally recent versions of linux will not issue short reads,
but it can happen so we should fix this.

This lead to weird image corruptions when short read happened

Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
v1 -> v2: also updated total_read to use += as suggested by Kevin,
thank you!

I've tested this quickly by making short reads "recursives", e.g. added
the following to luring_resubmit_short_read() after setting 'remaining':
    if (remaining > 4096) remaining -= 4096;

so when we ask for more we issue an extra short reads, making sure we go
through the two short reads path.
(Unfortunately I wasn't quite sure what to fiddle with to issue short
reads in the first place, I tried cutting one of the iovs short in
luring_do_submit() but I must not have been doing it properly as I ended
up with 0 return values which are handled by filling in with 0 (reads
after eof) and that didn't work well)

Anyway, this looks OK to me now.

Thanks,
Dominique

 block/io_uring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..b238661740f5 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -89,7 +89,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
     trace_luring_resubmit_short_read(s, luringcb, nread);
 
     /* Update read position */
-    luringcb->total_read = nread;
+    luringcb->total_read += nread;
     remaining = luringcb->qiov->size - luringcb->total_read;
 
     /* Shorten qiov */
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-- 
2.35.1

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Stefan Hajnoczi 3 years, 7 months ago

On Thu, Jun 30, 2022 at 10:01:37AM +0900, Dominique Martinet wrote:
> sqeq.off here is the offset to read within the disk image, so obviously
> not 'nread' (the amount we just read), but as the author meant to write
> its current value incremented by the amount we just read.
> 
> Normally recent versions of linux will not issue short reads,
> but it can happen so we should fix this.
> 
> This lead to weird image corruptions when short read happened
> 
> Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
> Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> v1 -> v2: also updated total_read to use += as suggested by Kevin,
> thank you!
> 
> I've tested this quickly by making short reads "recursives", e.g. added
> the following to luring_resubmit_short_read() after setting 'remaining':
>     if (remaining > 4096) remaining -= 4096;
> 
> so when we ask for more we issue an extra short reads, making sure we go
> through the two short reads path.
> (Unfortunately I wasn't quite sure what to fiddle with to issue short
> reads in the first place, I tried cutting one of the iovs short in
> luring_do_submit() but I must not have been doing it properly as I ended
> up with 0 return values which are handled by filling in with 0 (reads
> after eof) and that didn't work well)
> 
> Anyway, this looks OK to me now.
> 
> Thanks,
> Dominique
> 
>  block/io_uring.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Stefano Garzarella 3 years, 7 months ago

On Thu, Jun 30, 2022 at 10:01:37AM +0900, Dominique Martinet wrote:
>sqeq.off here is the offset to read within the disk image, so obviously
>not 'nread' (the amount we just read), but as the author meant to write
>its current value incremented by the amount we just read.
>
>Normally recent versions of linux will not issue short reads,
>but it can happen so we should fix this.
>
>This lead to weird image corruptions when short read happened
>
>Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
>Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
>Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>

Thanks for fixing this issue!

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>---
>v1 -> v2: also updated total_read to use += as suggested by Kevin,
>thank you!
>
>I've tested this quickly by making short reads "recursives", e.g. added
>the following to luring_resubmit_short_read() after setting 'remaining':
>    if (remaining > 4096) remaining -= 4096;
>
>so when we ask for more we issue an extra short reads, making sure we go
>through the two short reads path.
>(Unfortunately I wasn't quite sure what to fiddle with to issue short
>reads in the first place, I tried cutting one of the iovs short in
>luring_do_submit() but I must not have been doing it properly as I ended
>up with 0 return values which are handled by filling in with 0 (reads
>after eof) and that didn't work well)

Do you remember the kernel version where you first saw these problems?

Thanks,
Stefano

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Dominique Martinet 3 years, 7 months ago

Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> > so when we ask for more we issue an extra short reads, making sure we go
> > through the two short reads path.
> > (Unfortunately I wasn't quite sure what to fiddle with to issue short
> > reads in the first place, I tried cutting one of the iovs short in
> > luring_do_submit() but I must not have been doing it properly as I ended
> > up with 0 return values which are handled by filling in with 0 (reads
> > after eof) and that didn't work well)
> 
> Do you remember the kernel version where you first saw these problems?

Since you're quoting my paragraph about testing two short reads, I've
never seen any that I know of; but there's also no reason these couldn't
happen.

Single short reads have been happening for me with O_DIRECT (cache=none)
on btrfs for a while, but unfortunately I cannot remember which was the
first kernel I've seen this on -- I think rather than a kernel update it
was due to file manipulations that made the file eligible for short
reads in the first place (I started running deduplication on the backing
file)

The older kernel I have installed right now is 5.16 and that can
reproduce it --  I'll give my laptop some work over the weekend to test
still maintained stable branches if that's useful.

-- 
Dominique

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Stefan Hajnoczi 3 years, 7 months ago

On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> > > so when we ask for more we issue an extra short reads, making sure we go
> > > through the two short reads path.
> > > (Unfortunately I wasn't quite sure what to fiddle with to issue short
> > > reads in the first place, I tried cutting one of the iovs short in
> > > luring_do_submit() but I must not have been doing it properly as I ended
> > > up with 0 return values which are handled by filling in with 0 (reads
> > > after eof) and that didn't work well)
> > 
> > Do you remember the kernel version where you first saw these problems?
> 
> Since you're quoting my paragraph about testing two short reads, I've
> never seen any that I know of; but there's also no reason these couldn't
> happen.
> 
> Single short reads have been happening for me with O_DIRECT (cache=none)
> on btrfs for a while, but unfortunately I cannot remember which was the
> first kernel I've seen this on -- I think rather than a kernel update it
> was due to file manipulations that made the file eligible for short
> reads in the first place (I started running deduplication on the backing
> file)
> 
> The older kernel I have installed right now is 5.16 and that can
> reproduce it --  I'll give my laptop some work over the weekend to test
> still maintained stable branches if that's useful.

Hi Dominique,
Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
async context"). The comment above QEMU's luring_resubmit_short_read()
claims that short reads are a bug that was fixed by Linux commit
9d93a3f5a0c.

If the comment is inaccurate it needs to be fixed. Maybe short writes
need to be handled too.

I have CCed Jens and the io_uring mailing list to clarify:
1. Are short IORING_OP_READV reads possible on files/block devices?
2. Are short IORING_OP_WRITEV writes possible on files/block devices?

Thanks,
Stefan

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Dominique Martinet 3 years, 7 months ago

Stefan Hajnoczi wrote on Tue, Jul 05, 2022 at 02:28:08PM +0100:
> > The older kernel I have installed right now is 5.16 and that can
> > reproduce it --  I'll give my laptop some work over the weekend to test
> > still maintained stable branches if that's useful.
> 
> Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> async context"). The comment above QEMU's luring_resubmit_short_read()
> claims that short reads are a bug that was fixed by Linux commit
> 9d93a3f5a0c.
> 
> If the comment is inaccurate it needs to be fixed. Maybe short writes
> need to be handled too.
> 
> I have CCed Jens and the io_uring mailing list to clarify:
> 1. Are short IORING_OP_READV reads possible on files/block devices?
> 2. Are short IORING_OP_WRITEV writes possible on files/block devices?

Jens replied before me, so I won't be adding much (I agree with his
reply -- linux tries hard to avoid short reads but we should assume they
can happen)

In this particular case it was another btrfs bug with O_DIRECT and mixed
compression in a file, that's been fixed by this patch:
https://lore.kernel.org/all/20220630151038.GA459423@falcondesktop/

queued here:
https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=dio_fixes&id=b3864441547e49a69d45c7771aa8cc5e595d18fc

It should be backported to 5.10, but the problem will likely persist in
5.4 kernels if anyone runs on that as the code changed enough to make
backporting non-trivial.


So, WRT that comment, we probably should remove the reference to that
commit and leave in that they should be very rare but we need to handle
them anyway.


For writes in particular, I haven't seen any and looking at the code
qemu would blow up that storage (IO treated as ENOSPC would likely mark
disk read-only?)
It might make sense to add some warning message that it's what happened
so it'll be obvious what needs doing in case anyone falls on that but I
think the status-quo is good enough here.

-- 
Dominique

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Stefan Hajnoczi 3 years, 7 months ago

On Tue, 5 Jul 2022 at 23:53, Dominique Martinet
<dominique.martinet@atmark-techno.com> wrote:
>
> Stefan Hajnoczi wrote on Tue, Jul 05, 2022 at 02:28:08PM +0100:
> > > The older kernel I have installed right now is 5.16 and that can
> > > reproduce it --  I'll give my laptop some work over the weekend to test
> > > still maintained stable branches if that's useful.
> >
> > Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> > async context"). The comment above QEMU's luring_resubmit_short_read()
> > claims that short reads are a bug that was fixed by Linux commit
> > 9d93a3f5a0c.
> >
> > If the comment is inaccurate it needs to be fixed. Maybe short writes
> > need to be handled too.
> >
> > I have CCed Jens and the io_uring mailing list to clarify:
> > 1. Are short IORING_OP_READV reads possible on files/block devices?
> > 2. Are short IORING_OP_WRITEV writes possible on files/block devices?
>
> Jens replied before me, so I won't be adding much (I agree with his
> reply -- linux tries hard to avoid short reads but we should assume they
> can happen)
>
> In this particular case it was another btrfs bug with O_DIRECT and mixed
> compression in a file, that's been fixed by this patch:
> https://lore.kernel.org/all/20220630151038.GA459423@falcondesktop/
>
> queued here:
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=dio_fixes&id=b3864441547e49a69d45c7771aa8cc5e595d18fc
>
> It should be backported to 5.10, but the problem will likely persist in
> 5.4 kernels if anyone runs on that as the code changed enough to make
> backporting non-trivial.
>
>
> So, WRT that comment, we probably should remove the reference to that
> commit and leave in that they should be very rare but we need to handle
> them anyway.
>
>
> For writes in particular, I haven't seen any and looking at the code
> qemu would blow up that storage (IO treated as ENOSPC would likely mark
> disk read-only?)
> It might make sense to add some warning message that it's what happened
> so it'll be obvious what needs doing in case anyone falls on that but I
> think the status-quo is good enough here.

Great! I've already queued your fix.

Do you want to send a follow-up that updates the comment?

Thanks,
Stefan

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Dominique Martinet 3 years, 7 months ago

Stefan Hajnoczi wrote on Wed, Jul 06, 2022 at 08:17:42AM +0100:
> Great! I've already queued your fix.

Thanks!

> Do you want to send a follow-up that updates the comment?

I don't think I'd add much value at this point, leaving it to you unless
you really would prefer me to send it.


Cheers,
-- 
Dominique

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Stefan Hajnoczi 3 years, 7 months ago

On Wed, Jul 06, 2022 at 04:26:59PM +0900, Dominique Martinet wrote:
> Stefan Hajnoczi wrote on Wed, Jul 06, 2022 at 08:17:42AM +0100:
> > Great! I've already queued your fix.
> 
> Thanks!
> 
> > Do you want to send a follow-up that updates the comment?
> 
> I don't think I'd add much value at this point, leaving it to you unless
> you really would prefer me to send it.

That's fine. I'll send a patch. Thanks!

Stefan

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Jens Axboe 3 years, 7 months ago

On 7/5/22 7:28 AM, Stefan Hajnoczi wrote:
> On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
>> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
>>>> so when we ask for more we issue an extra short reads, making sure we go
>>>> through the two short reads path.
>>>> (Unfortunately I wasn't quite sure what to fiddle with to issue short
>>>> reads in the first place, I tried cutting one of the iovs short in
>>>> luring_do_submit() but I must not have been doing it properly as I ended
>>>> up with 0 return values which are handled by filling in with 0 (reads
>>>> after eof) and that didn't work well)
>>>
>>> Do you remember the kernel version where you first saw these problems?
>>
>> Since you're quoting my paragraph about testing two short reads, I've
>> never seen any that I know of; but there's also no reason these couldn't
>> happen.
>>
>> Single short reads have been happening for me with O_DIRECT (cache=none)
>> on btrfs for a while, but unfortunately I cannot remember which was the
>> first kernel I've seen this on -- I think rather than a kernel update it
>> was due to file manipulations that made the file eligible for short
>> reads in the first place (I started running deduplication on the backing
>> file)
>>
>> The older kernel I have installed right now is 5.16 and that can
>> reproduce it --  I'll give my laptop some work over the weekend to test
>> still maintained stable branches if that's useful.
> 
> Hi Dominique,
> Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> async context"). The comment above QEMU's luring_resubmit_short_read()
> claims that short reads are a bug that was fixed by Linux commit
> 9d93a3f5a0c.
> 
> If the comment is inaccurate it needs to be fixed. Maybe short writes
> need to be handled too.
> 
> I have CCed Jens and the io_uring mailing list to clarify:
> 1. Are short IORING_OP_READV reads possible on files/block devices?
> 2. Are short IORING_OP_WRITEV writes possible on files/block devices?

In general we try very hard to avoid them, but if eg we get a short read
or write from blocking context (eg io-wq), then io_uring does return
that. There's really not much we can do here, it seems futile to retry
IO which was issued just like it would've been from a normal blocking
syscall yet it is still short.

-- 
Jens Axboe

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Stefan Hajnoczi 3 years, 7 months ago

On Tue, 5 Jul 2022 at 20:26, Jens Axboe <axboe@kernel.dk> wrote:
>
> On 7/5/22 7:28 AM, Stefan Hajnoczi wrote:
> > On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
> >> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> >>>> so when we ask for more we issue an extra short reads, making sure we go
> >>>> through the two short reads path.
> >>>> (Unfortunately I wasn't quite sure what to fiddle with to issue short
> >>>> reads in the first place, I tried cutting one of the iovs short in
> >>>> luring_do_submit() but I must not have been doing it properly as I ended
> >>>> up with 0 return values which are handled by filling in with 0 (reads
> >>>> after eof) and that didn't work well)
> >>>
> >>> Do you remember the kernel version where you first saw these problems?
> >>
> >> Since you're quoting my paragraph about testing two short reads, I've
> >> never seen any that I know of; but there's also no reason these couldn't
> >> happen.
> >>
> >> Single short reads have been happening for me with O_DIRECT (cache=none)
> >> on btrfs for a while, but unfortunately I cannot remember which was the
> >> first kernel I've seen this on -- I think rather than a kernel update it
> >> was due to file manipulations that made the file eligible for short
> >> reads in the first place (I started running deduplication on the backing
> >> file)
> >>
> >> The older kernel I have installed right now is 5.16 and that can
> >> reproduce it --  I'll give my laptop some work over the weekend to test
> >> still maintained stable branches if that's useful.
> >
> > Hi Dominique,
> > Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> > async context"). The comment above QEMU's luring_resubmit_short_read()
> > claims that short reads are a bug that was fixed by Linux commit
> > 9d93a3f5a0c.
> >
> > If the comment is inaccurate it needs to be fixed. Maybe short writes
> > need to be handled too.
> >
> > I have CCed Jens and the io_uring mailing list to clarify:
> > 1. Are short IORING_OP_READV reads possible on files/block devices?
> > 2. Are short IORING_OP_WRITEV writes possible on files/block devices?
>
> In general we try very hard to avoid them, but if eg we get a short read
> or write from blocking context (eg io-wq), then io_uring does return
> that. There's really not much we can do here, it seems futile to retry
> IO which was issued just like it would've been from a normal blocking
> syscall yet it is still short.

Thanks! QEMU's short I/O handling is spotty - some code paths handle
it while others don't. For the io_uring QEMU block driver we'll try to
handle short all I/Os.

Stefan

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Dominique Martinet 3 years, 7 months ago

Dominique Martinet wrote on Fri, Jul 01, 2022 at 07:52:31AM +0900:
> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> > > so when we ask for more we issue an extra short reads, making sure we go
> > > through the two short reads path.
> > > (Unfortunately I wasn't quite sure what to fiddle with to issue short
> > > reads in the first place, I tried cutting one of the iovs short in
> > > luring_do_submit() but I must not have been doing it properly as I ended
> > > up with 0 return values which are handled by filling in with 0 (reads
> > > after eof) and that didn't work well)
> > 
> > Do you remember the kernel version where you first saw these problems?
> 
> Since you're quoting my paragraph about testing two short reads, I've
> never seen any that I know of; but there's also no reason these couldn't
> happen.
> 
> Single short reads have been happening for me with O_DIRECT (cache=none)
> on btrfs for a while, but unfortunately I cannot remember which was the
> first kernel I've seen this on -- I think rather than a kernel update it
> was due to file manipulations that made the file eligible for short
> reads in the first place (I started running deduplication on the backing
> file)
> 
> The older kernel I have installed right now is 5.16 and that can
> reproduce it --  I'll give my laptop some work over the weekend to test
> still maintained stable branches if that's useful.

I can confirm the same behavior happens with 5.4.202
I'm not sure about older kernels as my io_uring test that reproduces
short reads just doesn't start on these, but for all intent and purposes
we can probably say this has just always been possible.

The fix Filipe sent me unfortunately doesn't apply as far back as 5.4
(btrfs_get_blocks_direct had no flags argument back then, that got
introduced with conversion to dio in 5.8), but should probably work as
is for 5.10/15 kernels.

-- 
Dominique

Re: [PATCH v2] io_uring: fix short read slow path

Posted by Hanna Reitz 3 years, 7 months ago

On 30.06.22 03:01, Dominique Martinet wrote:
> sqeq.off here is the offset to read within the disk image, so obviously
> not 'nread' (the amount we just read), but as the author meant to write
> its current value incremented by the amount we just read.
>
> Normally recent versions of linux will not issue short reads,
> but it can happen so we should fix this.
>
> This lead to weird image corruptions when short read happened
>
> Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
> Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> v1 -> v2: also updated total_read to use += as suggested by Kevin,
> thank you!
>
> I've tested this quickly by making short reads "recursives", e.g. added
> the following to luring_resubmit_short_read() after setting 'remaining':
>      if (remaining > 4096) remaining -= 4096;
>
> so when we ask for more we issue an extra short reads, making sure we go
> through the two short reads path.
> (Unfortunately I wasn't quite sure what to fiddle with to issue short
> reads in the first place, I tried cutting one of the iovs short in
> luring_do_submit() but I must not have been doing it properly as I ended
> up with 0 return values which are handled by filling in with 0 (reads
> after eof) and that didn't work well)
>
> Anyway, this looks OK to me now.
>
> Thanks,
> Dominique
>
>   block/io_uring.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>

[PATCH] io_uring: fix short read slow path corruptions

Posted by Dominique Martinet 3 years, 7 months ago

sqeq.off here is the offset to read within the disk image, so obviously
not 'nread' (the amount we just read), but as the author meant to write
its current value incremented by the amount we just read.

Normally recent versions of linux will not issue short reads,
but apparently btrfs with O_DIRECT (cache=none) does.

This lead to weird image corruptions when short read happened

Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
Forgive the double mail if it gets to you twice: I missed Ccs on the first
try, I should have known better...

I just spent a couple of days on this bug, will follow up with kernel to
see if we can also not get rid of the short read but perhaps a warning
should be added the first time we get a short read, as it's not supposed
to happen?
Well, slow path now seems to work (at least my VM now boots fine), but
if the code clearly states it should never be used I assume there might
be other bugs laying there as it's not tested... That this one was easy
enough to spot once I noticed the short reads was its only grace...

Thanks!

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..d58aff9615ce 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-- 
2.35.1

Re: [PATCH] io_uring: fix short read slow path corruptions

Posted by Kevin Wolf 3 years, 7 months ago

Am 29.06.2022 um 07:23 hat Dominique Martinet geschrieben:
> sqeq.off here is the offset to read within the disk image, so obviously
> not 'nread' (the amount we just read), but as the author meant to write
> its current value incremented by the amount we just read.
> 
> Normally recent versions of linux will not issue short reads,
> but apparently btrfs with O_DIRECT (cache=none) does.
> 
> This lead to weird image corruptions when short read happened
> 
> Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
> Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> Forgive the double mail if it gets to you twice: I missed Ccs on the first
> try, I should have known better...
> 
> I just spent a couple of days on this bug, will follow up with kernel to
> see if we can also not get rid of the short read but perhaps a warning
> should be added the first time we get a short read, as it's not supposed
> to happen?
> Well, slow path now seems to work (at least my VM now boots fine), but
> if the code clearly states it should never be used I assume there might
> be other bugs laying there as it's not tested... That this one was easy
> enough to spot once I noticed the short reads was its only grace...
> 
> Thanks!
> 
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index d48e472e74cb..d58aff9615ce 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
>                        remaining);
>  
>      /* Update sqe */
> -    luringcb->sqeq.off = nread;
> +    luringcb->sqeq.off += nread;
>      luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
>      luringcb->sqeq.len = luringcb->resubmit_qiov.niov;

I see this a few lines above:

    /* Update read position */
    luringcb->total_read = nread;

Doesn't it have the same problem? Though maybe getting two short reads
is more of a theoretical case.

Kevin

Re: [PATCH] io_uring: fix short read slow path corruptions

Posted by Dominique Martinet 3 years, 7 months ago

Kevin Wolf wrote on Wed, Jun 29, 2022 at 10:46:08AM +0200:
> I see this a few lines above:
> 
>     /* Update read position */
>     luringcb->total_read = nread;
> 
> Doesn't it have the same problem? Though maybe getting two short reads
> is more of a theoretical case.

Good catch, I'll send a v2 with that one adjusted as well tomorrow -- I
had logs and can confirm I didn't get two short reads in a row...

I wonder if there'd be a way to test? E.g. act like we've requested n
bytes but actually only fill in a bit less in the iov?

-- 
Dominique