netfs: Fix early read unlock of page with EOF in middle

[PATCH] netfs: Fix early read unlock of page with EOF in middle

Posted by David Howells 1 month, 2 weeks ago

The read result collection for buffered reads seems to run ahead of the
completion of subrequests under some circumstances, as can be seen in the
following log snippet:

    9p_client_res: client 18446612686390831168 response P9_TREAD tag  0 err 0
    ...
    netfs_sreq: R=00001b55[1] DOWN TERM  f=192 s=0 5fb2/5fb2 s=5 e=0
    ...
    netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
    netfs_folio: i=157f3 ix=00004-00004 read-done
    netfs_folio: i=157f3 ix=00004-00004 read-unlock
    netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
    netfs_folio: i=157f3 ix=00005-00005 read-done
    netfs_folio: i=157f3 ix=00005-00005 read-unlock
    ...
    netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
    netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
    netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
    netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
    ...
    netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
    netfs_sreq: R=00001b55[2] ZERO TERM  f=102 s=5fb2 4e/4e s=5 e=0

The 'cto=5fb2' indicates the collected file pos we've collected results to
so far - but we still have 0x4e more bytes to go - so we shouldn't have
collected folio ix=00005 yet.  The 'ZERO' subreq that clears the tail
happens after we unlock the folio, allowing the application to see the
uncleared tail through mmap.

The problem is that netfs_read_unlock_folios() will unlock a folio in which
the amount of read results collected hits EOF position - but the ZERO
subreq lies beyond that and so happens after.

Fix this by changing the end check to always be the end of the folio and
never the end of the file.

In the future, I should look at clearing to the end of the folio here rather
than adding a ZERO subreq to do this.  On the other hand, the ZERO subreq can
run in parallel with an async READ subreq.  Further, the ZERO subreq may still
be necessary to, say, handle extents in a ceph file that don't have any
backing store and are thus implicitly all zeros.

This can be reproduced by creating a file, the size of which doesn't align
to a page boundary, e.g. 24998 (0x5fb2) bytes and then doing something
like:

    xfs_io -c "mmap -r 0 0x6000" -c "madvise -d 0 0x6000" \
           -c "mread -v 0 0x6000" /xfstest.test/x

The last 0x4e bytes should all be 00, but if the tail hasn't been cleared
yet, you may see rubbish there.  This can be reproduced with kafs by
modifying the kernel to disable the call to netfs_read_subreq_progress()
and to stop afs_issue_read() from doing the async call for NETFS_READAHEAD.
Reproduction can be made easier by inserting an mdelay(100) in
netfs_issue_read() for the ZERO-subreq case.

AFS and CIFS are normally unlikely to show this as they dispatch READ ops
asynchronously, which allows the ZERO-subreq to finish first.  9P's READ op is
completely synchronous, so the ZERO-subreq will always happen after.  It isn't
seen all the time, though, because the collection may be done in a worker
thread.

Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Link: https://lore.kernel.org/r/8622834.T7Z3S40VBb@weasel/
Signed-off-by: David Howells <dhowells@redhat.com>
Suggested-by: Dominique Martinet <asmadeus@codewreck.org>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/read_collect.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index a95e7aadafd0..7a0ffa675fb1 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -137,7 +137,7 @@ static void netfs_read_unlock_folios(struct netfs_io_request *rreq,
 		rreq->front_folio_order = order;
 		fsize = PAGE_SIZE << order;
 		fpos = folio_pos(folio);
-		fend = umin(fpos + fsize, rreq->i_size);
+		fend = fpos + fsize;
 
 		trace_netfs_collect_folio(rreq, folio, fend, collected_to);

Re: [PATCH] netfs: Fix early read unlock of page with EOF in middle

Posted by Christian Schoenebeck 1 month, 2 weeks ago

On Saturday, 20 December 2025 13:31:40 CET David Howells wrote:
> The read result collection for buffered reads seems to run ahead of the
> completion of subrequests under some circumstances, as can be seen in the
> following log snippet:
> 
>     9p_client_res: client 18446612686390831168 response P9_TREAD tag  0 err
> 0 ...
>     netfs_sreq: R=00001b55[1] DOWN TERM  f=192 s=0 5fb2/5fb2 s=5 e=0
>     ...
>     netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
>     netfs_folio: i=157f3 ix=00004-00004 read-done
>     netfs_folio: i=157f3 ix=00004-00004 read-unlock
>     netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
>     netfs_folio: i=157f3 ix=00005-00005 read-done
>     netfs_folio: i=157f3 ix=00005-00005 read-unlock
>     ...
>     netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
>     netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
>     netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
>     netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
>     ...
>     netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
>     netfs_sreq: R=00001b55[2] ZERO TERM  f=102 s=5fb2 4e/4e s=5 e=0
> 
> The 'cto=5fb2' indicates the collected file pos we've collected results to
> so far - but we still have 0x4e more bytes to go - so we shouldn't have
> collected folio ix=00005 yet.  The 'ZERO' subreq that clears the tail
> happens after we unlock the folio, allowing the application to see the
> uncleared tail through mmap.
> 
> The problem is that netfs_read_unlock_folios() will unlock a folio in which
> the amount of read results collected hits EOF position - but the ZERO
> subreq lies beyond that and so happens after.
> 
> Fix this by changing the end check to always be the end of the folio and
> never the end of the file.
> 
> In the future, I should look at clearing to the end of the folio here rather
> than adding a ZERO subreq to do this.  On the other hand, the ZERO subreq
> can run in parallel with an async READ subreq.  Further, the ZERO subreq
> may still be necessary to, say, handle extents in a ceph file that don't
> have any backing store and are thus implicitly all zeros.
> 
> This can be reproduced by creating a file, the size of which doesn't align
> to a page boundary, e.g. 24998 (0x5fb2) bytes and then doing something
> like:
> 
>     xfs_io -c "mmap -r 0 0x6000" -c "madvise -d 0 0x6000" \
>            -c "mread -v 0 0x6000" /xfstest.test/x
> 
> The last 0x4e bytes should all be 00, but if the tail hasn't been cleared
> yet, you may see rubbish there.  This can be reproduced with kafs by
> modifying the kernel to disable the call to netfs_read_subreq_progress()
> and to stop afs_issue_read() from doing the async call for NETFS_READAHEAD.
> Reproduction can be made easier by inserting an mdelay(100) in
> netfs_issue_read() for the ZERO-subreq case.
> 
> AFS and CIFS are normally unlikely to show this as they dispatch READ ops
> asynchronously, which allows the ZERO-subreq to finish first.  9P's READ op
> is completely synchronous, so the ZERO-subreq will always happen after.  It
> isn't seen all the time, though, because the collection may be done in a
> worker thread.
> 
> Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com>
> Link: https://lore.kernel.org/r/8622834.T7Z3S40VBb@weasel/
> Signed-off-by: David Howells <dhowells@redhat.com>
> Suggested-by: Dominique Martinet <asmadeus@codewreck.org>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: v9fs@lists.linux.dev
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---

I had bisected this mmap() data corruption to e2d46f2ec332 ("netfs: Change the 
read result collector to only use one work item"). So maybe adding a Fixes: 
tag for this as suggested by Dominique?

With the patch applied, this issue disappeared. Give me some hours for more 
thorough tests, due to the random factor involved.

>  fs/netfs/read_collect.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index a95e7aadafd0..7a0ffa675fb1 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -137,7 +137,7 @@ static void netfs_read_unlock_folios(struct
> netfs_io_request *rreq, rreq->front_folio_order = order;
>  		fsize = PAGE_SIZE << order;
>  		fpos = folio_pos(folio);
> -		fend = umin(fpos + fsize, rreq->i_size);
> +		fend = fpos + fsize;
> 
>  		trace_netfs_collect_folio(rreq, folio, fend, 
collected_to);

What about write_collect.c side, is it safe as is?

/Christian

Re: [PATCH] netfs: Fix early read unlock of page with EOF in middle

Posted by David Howells 1 month, 2 weeks ago

Christian Schoenebeck <linux_oss@crudebyte.com> wrote:

> What about write_collect.c side, is it safe as is?

That's a good question - but it should be safe since it doesn't modify the
content of the page and doesn't typically write beyond the EOF (though it
might write a block that spans the EOF marker to the cache).

David

Re: [PATCH] netfs: Fix early read unlock of page with EOF in middle

Posted by Dominique Martinet 1 month, 2 weeks ago

Christian Schoenebeck wrote on Sat, Dec 20, 2025 at 03:55:09PM +0100:
> I had bisected this mmap() data corruption to e2d46f2ec332 ("netfs: Change the 
> read result collector to only use one work item"). So maybe adding a Fixes: 
> tag for this as suggested by Dominique?

Yes, I had also confirmed the bug starts occuring since that commit (and
the fix works on that point in the tree as well), please add:
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")

That aside, your commit message looks good to me:
Acked-by: Dominique Martinet <asmadeus@codewreck.org>

Thank you for having taken the time to look at it!
-- 
Dominique Martinet | Asmadeus

Re: [PATCH] netfs: Fix early read unlock of page with EOF in middle

Posted by Christian Schoenebeck 1 month, 2 weeks ago

On Saturday, 20 December 2025 15:55:09 CET Christian Schoenebeck wrote:
> On Saturday, 20 December 2025 13:31:40 CET David Howells wrote:
> > The read result collection for buffered reads seems to run ahead of the
> > completion of subrequests under some circumstances, as can be seen in the
[...]
> With the patch applied, this issue disappeared. Give me some hours for more
> thorough tests, due to the random factor involved.

Dominique, David, looks good! Thanks!

Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com>

Re: [PATCH] netfs: Fix early read unlock of page with EOF in middle

Posted by Christian Brauner 1 month, 2 weeks ago

On Sat, 20 Dec 2025 12:31:40 +0000, David Howells wrote:
> The read result collection for buffered reads seems to run ahead of the
> completion of subrequests under some circumstances, as can be seen in the
> following log snippet:
> 
>     9p_client_res: client 18446612686390831168 response P9_TREAD tag  0 err 0
>     ...
>     netfs_sreq: R=00001b55[1] DOWN TERM  f=192 s=0 5fb2/5fb2 s=5 e=0
>     ...
>     netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
>     netfs_folio: i=157f3 ix=00004-00004 read-done
>     netfs_folio: i=157f3 ix=00004-00004 read-unlock
>     netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
>     netfs_folio: i=157f3 ix=00005-00005 read-done
>     netfs_folio: i=157f3 ix=00005-00005 read-unlock
>     ...
>     netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
>     netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
>     netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
>     netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
>     ...
>     netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
>     netfs_sreq: R=00001b55[2] ZERO TERM  f=102 s=5fb2 4e/4e s=5 e=0
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] netfs: Fix early read unlock of page with EOF in middle
      https://git.kernel.org/vfs/vfs/c/570ad253a345