fs/fuse/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
A FUSE server that advertises a large max_pages and max_write (e.g.
max_pages=256, max_write=1MB) cannot currently obtain matching
FUSE_READ request sizes from the kernel. Buffered sequential writes
arrive at the server at the negotiated max_write size, but buffered
sequential reads remain capped at the kernel's default readahead
window (VM_READAHEAD_PAGES, 128KB; doubled to 256KB for files marked
POSIX_FADV_SEQUENTIAL). A 1MB application read() therefore turns
into four sequential 256KB FUSE_READ round-trips instead of one.
This is because process_init_reply() processes the server's
max_readahead response as:
ra_pages = arg->max_readahead / PAGE_SIZE;
fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
Since the kernel sends its current bdi->ra_pages as
init_in->max_readahead, and bdi->ra_pages is the default
VM_READAHEAD_PAGES at this point, the server can only ever decrease
the readahead window -- never increase it. Even if the server
replies with max_readahead=1MB, the min() clamps it back to 128KB.
This clamp dates to commit 9cd684551124 ("[PATCH] fuse: fix async
read for legacy filesystems"), which introduced max_readahead at FUSE
protocol 7.6 and used min() to preserve legacy (<7.6) filesystem
behaviour. Modern filesystems that explicitly advertise a larger
max_readahead are silently overridden.
Other filesystems set ra_pages or io_pages directly from negotiated
server/device capabilities: cifs sets ra_pages from rsize/rasize,
ceph from rasize/rsize mount options, 9p from maxdata, and nfs sets
io_pages from rpages.
Use the server's max_readahead response directly, bounded by
fc->max_pages (which is itself bounded by fc->max_pages_limit and,
for virtio-fs, by the virtqueue descriptor count):
fm->sb->s_bdi->ra_pages = min_t(unsigned int, ra_pages,
fc->max_pages);
This is backward compatible:
- Servers that echo init_in->max_readahead back unchanged see the
same effective readahead as today.
- Servers that reply with a smaller value still reduce ra_pages.
- Servers that do not negotiate FUSE_MAX_PAGES see no change, since
fc->max_pages defaults to FUSE_DEFAULT_MAX_PAGES_PER_REQ (32),
matching VM_READAHEAD_PAGES.
- Only servers that both negotiate FUSE_MAX_PAGES and advertise a
larger max_readahead see the new behaviour, and in that case
fc->max_pages already gates per-request data size.
Signed-off-by: Jim Harris <jim.harris@nvidia.com>
Assisted-by: Cursor:claude-opus-4.7
---
Notes on AI assistance:
The code analysis (tracing the readahead negotiation in
process_init_reply(), confirming the behaviour of ractl_max_pages()
in mm/readahead.c, and surveying how other filesystems set
ra_pages/io_pages) and the bulk of this changelog were drafted with
an AI coding assistant (see Assisted-by trailer). The one-line code
change was reviewed by me. The motivating performance observation
(a 1MB application read producing four 256KB FUSE_READ requests
against a server advertising max_pages=256 and max_write=1MB) was
observed by me on a real virtio-fs workload prior to any AI
involvement, and verification of patched and unpatched behaviour
was performed by me.
fs/fuse/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index deddfffb037f..272026f11a34 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1494,7 +1494,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
init_server_timeout(fc, timeout);
fm->sb->s_bdi->ra_pages =
- min(fm->sb->s_bdi->ra_pages, ra_pages);
+ min_t(unsigned int, ra_pages, fc->max_pages);
fc->minor = arg->minor;
fc->max_write = arg->minor < 5 ? 4096 : arg->max_write;
fc->max_write = max_t(unsigned, 4096, fc->max_write);
--
2.43.0
On Tue, Jun 2, 2026 at 2:14 PM Jim Harris <jim.harris@nvidia.com> wrote:
>
> A FUSE server that advertises a large max_pages and max_write (e.g.
> max_pages=256, max_write=1MB) cannot currently obtain matching
> FUSE_READ request sizes from the kernel. Buffered sequential writes
> arrive at the server at the negotiated max_write size, but buffered
> sequential reads remain capped at the kernel's default readahead
> window (VM_READAHEAD_PAGES, 128KB; doubled to 256KB for files marked
> POSIX_FADV_SEQUENTIAL). A 1MB application read() therefore turns
> into four sequential 256KB FUSE_READ round-trips instead of one.
>
> This is because process_init_reply() processes the server's
> max_readahead response as:
>
> ra_pages = arg->max_readahead / PAGE_SIZE;
> fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
>
> Since the kernel sends its current bdi->ra_pages as
> init_in->max_readahead, and bdi->ra_pages is the default
> VM_READAHEAD_PAGES at this point, the server can only ever decrease
> the readahead window -- never increase it. Even if the server
> replies with max_readahead=1MB, the min() clamps it back to 128KB.
>
> This clamp dates to commit 9cd684551124 ("[PATCH] fuse: fix async
> read for legacy filesystems"), which introduced max_readahead at FUSE
> protocol 7.6 and used min() to preserve legacy (<7.6) filesystem
> behaviour. Modern filesystems that explicitly advertise a larger
> max_readahead are silently overridden.
>
> Other filesystems set ra_pages or io_pages directly from negotiated
> server/device capabilities: cifs sets ra_pages from rsize/rasize,
> ceph from rasize/rsize mount options, 9p from maxdata, and nfs sets
> io_pages from rpages.
>
> Use the server's max_readahead response directly, bounded by
> fc->max_pages (which is itself bounded by fc->max_pages_limit and,
> for virtio-fs, by the virtqueue descriptor count):
>
> fm->sb->s_bdi->ra_pages = min_t(unsigned int, ra_pages,
> fc->max_pages);
>
> This is backward compatible:
>
> - Servers that echo init_in->max_readahead back unchanged see the
> same effective readahead as today.
> - Servers that reply with a smaller value still reduce ra_pages.
> - Servers that do not negotiate FUSE_MAX_PAGES see no change, since
> fc->max_pages defaults to FUSE_DEFAULT_MAX_PAGES_PER_REQ (32),
> matching VM_READAHEAD_PAGES.
> - Only servers that both negotiate FUSE_MAX_PAGES and advertise a
> larger max_readahead see the new behaviour, and in that case
> fc->max_pages already gates per-request data size.
>
> Signed-off-by: Jim Harris <jim.harris@nvidia.com>
> Assisted-by: Cursor:claude-opus-4.7
> ---
> Notes on AI assistance:
>
> The code analysis (tracing the readahead negotiation in
> process_init_reply(), confirming the behaviour of ractl_max_pages()
> in mm/readahead.c, and surveying how other filesystems set
> ra_pages/io_pages) and the bulk of this changelog were drafted with
> an AI coding assistant (see Assisted-by trailer). The one-line code
> change was reviewed by me. The motivating performance observation
> (a 1MB application read producing four 256KB FUSE_READ requests
> against a server advertising max_pages=256 and max_write=1MB) was
> observed by me on a real virtio-fs workload prior to any AI
> involvement, and verification of patched and unpatched behaviour
> was performed by me.
>
> fs/fuse/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index deddfffb037f..272026f11a34 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1494,7 +1494,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> init_server_timeout(fc, timeout);
>
> fm->sb->s_bdi->ra_pages =
> - min(fm->sb->s_bdi->ra_pages, ra_pages);
> + min_t(unsigned int, ra_pages, fc->max_pages);
Looking at how the mm code uses ra_pages, I think this will also have
the side effect of upping the number of pages read into the page cache
for speculative readahead. I don't think this is safe, at least not
for unprivileged servers.
I think setting s_bdi->io_pages would be a better fit here. fromthe
logic in page_cache_sync_ra() -> ractl_max_pages(), this would only
exceed the readahead window size limit for non-speculative readahead.
Thanks,
Joanne
> fc->minor = arg->minor;
> fc->max_write = arg->minor < 5 ? 4096 : arg->max_write;
> fc->max_write = max_t(unsigned, 4096, fc->max_write);
> --
> 2.43.0
>
>
© 2016 - 2026 Red Hat, Inc.