[PATCH] fuse: do not treat unlimited readdir count as a buffer size

Matthew R. Ochs posted 1 patch 1 month, 2 weeks ago
fs/fuse/readdir.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
[PATCH] fuse: do not treat unlimited readdir count as a buffer size
Posted by Matthew R. Ochs 1 month, 2 weeks ago
Commit dabb90391028 ("fuse: increase readdir buffer size") changed
fuse_readdir_uncached() to size its temporary buffer from ctx->count,
clamped to the negotiated FUSE maximum request size.

That is correct for normal userspace getdents callers, where ctx->count is
the userspace dirent buffer size. It is not correct for in-kernel callers
that use the VFS sentinel values documented for struct dir_context.count:
0 means unknown and INT_MAX means unlimited.

Overlayfs uses INT_MAX when reading merged directories. After
dabb90391028, FUSE interprets that sentinel as a real size request and
expands the readdir buffer to fc->max_pages << PAGE_SHIFT.

For virtiofs, the output kvec is included in the request bounce buffer
allocated by copy_args_to_argbuf():

  req->argbuf = kmalloc(len, GFP_ATOMIC);

On a 64K-page guest, this can require a multi-megabyte contiguous
GFP_ATOMIC allocation. In the failing setup, a 64K-page guest on a 4K-page
host negotiated max_pages=124, so the computed buffer was about 8MB. The
same guest on a 64K-page host negotiated max_pages=16, limiting the
computed buffer to 1MB and masking the bug.

One way to reproduce this is a 64K-page guest on a 4K-page host with an
overlayfs mount whose lower directory is on virtiofs. Reading a merged
directory through overlayfs can then fail with:

  ls: reading directory '<path>': Cannot allocate memory

Treat unknown and unlimited counts the same way fuse_readdir_uncached()
did before dabb90391028: use PAGE_SIZE. Keep the larger readdir buffer
for callers that provide a meaningful positive count.

Fixes: dabb90391028 ("fuse: increase readdir buffer size")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
---
 fs/fuse/readdir.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index c2aae2eef086..0e436c563efb 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -341,7 +341,10 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
 	struct fuse_io_args ia = {};
 	struct fuse_args *args = &ia.ap.args;
 	void *buf;
-	size_t bufsize = clamp((unsigned int) ctx->count, PAGE_SIZE, fc->max_pages << PAGE_SHIFT);
+	unsigned int count = (unsigned int)ctx->count;
+	size_t bufsize = (count && count != (unsigned int)INT_MAX) ?
+		clamp(count, (unsigned int)PAGE_SIZE, fc->max_pages << PAGE_SHIFT) :
+		PAGE_SIZE;
 	u64 attr_version = 0, evict_ctr = 0;
 	bool locked;
 
-- 
2.50.1
Re: [PATCH] fuse: do not treat unlimited readdir count as a buffer size
Posted by Miklos Szeredi 1 month, 2 weeks ago
On Tue, 28 Apr 2026 at 04:13, Matthew R. Ochs <mochs@nvidia.com> wrote:

> For virtiofs, the output kvec is included in the request bounce buffer
> allocated by copy_args_to_argbuf():
>
>   req->argbuf = kmalloc(len, GFP_ATOMIC);

Ugh.   The real bug here is inappropriate use of the bounce buffer.
fuse_readdir_uncached() should instead supply an array of pages.

It's a little more complicated, but would fix this properly: overlayfs
does want to get as much of the directory as possible in one go to be
most efficient.

I'd go with vmalloc -> alloc_pages_bulk, then vm_map_ram() before
parsing the result.

Thanks,
Miklos