fs/9p/vfs_dentry.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
From: Dominique Martinet <asmadeus@codewreck.org>
cache=mmap is a mode that doesn't cache metadata, but still has
writeback cache.
In commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate
for uncached mode too") we considered metadata cache to be enough to
not look at the server, but in writeback cache too looking at the server
size would make the vfs consider the file has been truncated before the
data has been flushed out, making the following repro fail (nothing is
ever read back, the resulting file ends up with no data written)
---
I was suspecting cache=mmap when I saw vmtest used that, and it's
confirmed with Song's reproducer...
This makes the repro work, would one of you be able to confirm this?
Once confirmed I'll send this to Linus directly
Thanks again and sorry for lack of cache=mmap tests :/
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
---
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
char buf[4096];
int main(int argc, char *argv[])
{
int ret, i;
int fdw, fdr;
if (argc < 2)
return 1;
fdw = openat(AT_FDCWD, argv[1], O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600);
if (fdw < 0) {
fprintf(stderr, "cannot open fdw\n");
return 1;
}
write(fdw, buf, sizeof(buf));
fdr = openat(AT_FDCWD, argv[1], O_RDONLY|O_CLOEXEC);
if (fdr < 0) {
fprintf(stderr, "cannot open fdr\n");
close(fdw);
return 1;
}
for (i = 0; i < 10; i++) {
ret = read(fdr, buf, sizeof(buf));
fprintf(stderr, "i: %d, read returns %d\n", i, ret);
}
close(fdr);
close(fdw);
return 0;
}
---
Reported-by: Song Liu <song@kernel.org>
Link: https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/bpf/CAEf4BzZbCE4tLoDZyUf_aASpgAGFj75QMfSXX4a4dLYixnOiLg@mail.gmail.com/
Fixes: 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for uncached mode too")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
---
fs/9p/vfs_dentry.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index f3248a3e54023489054337bf98e87a1d7b1fbafc..2f3654bf6275ffa802dc25b9a84bd61a62d5723c 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -78,7 +78,11 @@ static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
v9inode = V9FS_I(inode);
struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
- cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
+ /* We also don't want to refresh attr here in writeback cache mode,
+ * otherwise a write that hasn't been propagated to server would
+ * incorrectly get the old size back and truncate the file before
+ * the write happens */
+ cached = v9ses->cache & (CACHE_META | CACHE_WRITEBACK | CACHE_LOOSE);
if (!cached || v9inode->cache_validity & V9FS_INO_INVALID_ATTR) {
int retval;
---
base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada
change-id: 20251022-mmap-regression-f0f40d51046d
Best regards,
--
Dominique Martinet <asmadeus@codewreck.org>
I do apologize for missing this initially... (I wonder if it would have been caught by xfstests in cache=mmap mode) On 10/21/25 23:09, Dominique Martinet via B4 Relay wrote: > From: Dominique Martinet <asmadeus@codewreck.org> > > cache=mmap is a mode that doesn't cache metadata, but still has > writeback cache. > In commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate > for uncached mode too") we considered metadata cache to be enough to > not look at the server, but in writeback cache too looking at the server > size would make the vfs consider the file has been truncated before the > data has been flushed out, making the following repro fail (nothing is > ever read back, the resulting file ends up with no data written) > > --- > I was suspecting cache=mmap when I saw vmtest used that, and it's > confirmed with Song's reproducer... > This makes the repro work, would one of you be able to confirm this? > Once confirmed I'll send this to Linus directly > > Thanks again and sorry for lack of cache=mmap tests :/ > > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > --- > #include <fcntl.h> > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > > char buf[4096]; > > int main(int argc, char *argv[]) > { > int ret, i; > int fdw, fdr; > > if (argc < 2) > return 1; > > fdw = openat(AT_FDCWD, argv[1], O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600); > if (fdw < 0) { > fprintf(stderr, "cannot open fdw\n"); > return 1; > } > write(fdw, buf, sizeof(buf)); > > fdr = openat(AT_FDCWD, argv[1], O_RDONLY|O_CLOEXEC); > > if (fdr < 0) { > fprintf(stderr, "cannot open fdr\n"); > close(fdw); > return 1; > } > > for (i = 0; i < 10; i++) { > ret = read(fdr, buf, sizeof(buf)); > fprintf(stderr, "i: %d, read returns %d\n", i, ret); > } > > close(fdr); > close(fdw); > return 0; > } > --- > > Reported-by: Song Liu <song@kernel.org> > Link: https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> > Link: https://lore.kernel.org/bpf/CAEf4BzZbCE4tLoDZyUf_aASpgAGFj75QMfSXX4a4dLYixnOiLg@mail.gmail.com/ > Fixes: 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for uncached mode too") > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > --- > fs/9p/vfs_dentry.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c > index f3248a3e54023489054337bf98e87a1d7b1fbafc..2f3654bf6275ffa802dc25b9a84bd61a62d5723c 100644 > --- a/fs/9p/vfs_dentry.c > +++ b/fs/9p/vfs_dentry.c > @@ -78,7 +78,11 @@ static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags) > v9inode = V9FS_I(inode); > struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode); > > - cached = v9ses->cache & (CACHE_META | CACHE_LOOSE); > + /* We also don't want to refresh attr here in writeback cache mode, > + * otherwise a write that hasn't been propagated to server would > + * incorrectly get the old size back and truncate the file before > + * the write happens */ > + cached = v9ses->cache & (CACHE_META | CACHE_WRITEBACK | CACHE_LOOSE); This makes sense, but I was also wondering about this bit in v9fs_refresh_inode / ..._dotl: flags = (v9ses->cache & CACHE_LOOSE) ? V9FS_STAT2INODE_KEEP_ISIZE : 0; I do wonder what else can cause us to end up mistakenly updating the i_size, since the condition below would also pass if v9inode->cache_validity & V9FS_INO_INVALID_ATTR is true, even in cached mode: > > if (!cached || v9inode->cache_validity & V9FS_INO_INVALID_ATTR) { > int retval; One instances where we set this invalid flag is in v9fs_vfs_rename. With the program pasted below, which adds an additional renaming of the file written before reading, this seems to still cause truncation, even with this fix: #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/stat.h> char buf[4096]; int main(int argc, char *argv[]) { int ret, i; int fdw, fdr; struct stat statbuf; if (argc < 3) return 1; fdw = openat(AT_FDCWD, argv[1], O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, 0600); if (fdw < 0) { fprintf(stderr, "cannot open fdw\n"); return 1; } ret = write(fdw, buf, sizeof(buf)); if (ret < 0) { fprintf(stderr, "write failed\n"); return 1; } ret = renameat(AT_FDCWD, argv[1], AT_FDCWD, argv[2]); if (ret < 0) { fprintf(stderr, "renameat failed\n"); return 1; } fdr = openat(AT_FDCWD, argv[2], O_RDONLY | O_CLOEXEC); if (fdr < 0) { fprintf(stderr, "cannot open fdr\n"); return 1; } ret = read(fdr, buf, sizeof(buf)); fprintf(stderr, "read returns %d\n", ret); ret = fstat(fdr, &statbuf); if (ret < 0) { fprintf(stderr, "fstat failed\n"); return 1; } fprintf(stderr, "fstat: size = %lld\n", statbuf.st_size); close(fdr); unlink(argv[1]); unlink(argv[2]); return 0; } Output: root@v6.18-rc2:/# ./reproducer /tmp/9p/a /tmp/9p/b read returns 0 fstat: size = 0 root@v6.18-rc2:/# ./reproducer.orig /tmp/9p/a /tmp/9p/b # the original reproducer does work i: 0, read returns 4096 i: 1, read returns 0 ... Before the "Refresh metadata in d_revalidate for uncached mode too" patch series, it happened that in cache=mmap, v9fs_lookup_revalidate would not be called at all, since it's not used for "uncached" mode, where uncached is defined as !(CACHE_META|CACHE_LOOSE) in v9fs_mount. Therefore this is probably still a regression introduced by that series :( The below change fixes both reproducer: diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index 69f378a83775..316fc41513d7 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -1352,7 +1352,7 @@ int v9fs_refresh_inode(struct p9_fid *fid, struct inode *inode) * We don't want to refresh inode->i_size, * because we may have cached data */ - flags = (v9ses->cache & CACHE_LOOSE) ? + flags = (v9ses->cache & (CACHE_LOOSE | CACHE_WRITEBACK)) ? V9FS_STAT2INODE_KEEP_ISIZE : 0; v9fs_stat2inode(st, inode, inode->i_sb, flags); out: @@ -1399,4 +1399,3 @@ static const struct inode_operations v9fs_symlink_inode_operations = { .getattr = v9fs_vfs_getattr, .setattr = v9fs_vfs_setattr, }; - diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c index 0b404e8484d2..500d8e922f07 100644 --- a/fs/9p/vfs_inode_dotl.c +++ b/fs/9p/vfs_inode_dotl.c @@ -910,7 +910,7 @@ int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode) * We don't want to refresh inode->i_size, * because we may have cached data */ - flags = (v9ses->cache & CACHE_LOOSE) ? + flags = (v9ses->cache & (CACHE_LOOSE | CACHE_WRITEBACK)) ? V9FS_STAT2INODE_KEEP_ISIZE : 0; v9fs_stat2inode_dotl(st, inode, flags); out: Should we change this too? (Well, in fact, the change above alone makes both issue disappear, but I'm not 100% sure about the implication of updating metadata aside from i_size for inodes with cached data) Kind regards, Tingmao
On Tue, Oct 21, 2025 at 3:10 PM Dominique Martinet via B4 Relay <devnull+asmadeus.codewreck.org@kernel.org> wrote: > > From: Dominique Martinet <asmadeus@codewreck.org> [...] > --- > > Reported-by: Song Liu <song@kernel.org> > Link: https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> > Link: https://lore.kernel.org/bpf/CAEf4BzZbCE4tLoDZyUf_aASpgAGFj75QMfSXX4a4dLYixnOiLg@mail.gmail.com/ > Fixes: 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for uncached mode too") > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> I can confirm this fixes bpftrace and the reproducer in the VM. Tested-by: Song Liu <song@kernel.org> Thanks for the quick fix! [...]
Hi Linus, We had a regression with cache=mmap that impacted quite a few people so I'm sending a fix less than a couple of hours after making the commit. If it turns out there are other side effects I'd suggest just reverting commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for uncached mode too") first, but the fix is rather minimal so I think it's ok to try falling forward -- let me know if you prefer a revert and I'll send one instead (there's a minor conflict) Thanks to Sung Liu for the minimal reproducer and testing, as well as Alexei/Andrii and everyone else who looked at it. The following changes since commit 211ddde0823f1442e4ad052a2f30f050145ccada: Linux 6.18-rc2 (2025-10-19 15:19:16 -1000) are available in the Git repository at: https://github.com/martinetd/linux tags/9p-for-6.18-rc3 for you to fetch changes up to 2776e27d404684bc43acf023d7ca15255e96b3e3: fs/9p: don't use cached metadata in revalidate for cache=mmap (2025-10-22 08:04:05 +0900) ---------------------------------------------------------------- Fix regression with cache=mmap in 6.18-rc1 Will do some more testing as time allows but this fixes the immediate issue minimally (only impacts cache=mmap), and is therefore an improvement good enough to send right away. ---------------------------------------------------------------- Dominique Martinet (1): fs/9p: don't use cached metadata in revalidate for cache=mmap fs/9p/vfs_dentry.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) -- Dominique Martinet | Asmadeus
On 10/22/25 00:16, Dominique Martinet wrote: > Hi Linus, > > We had a regression with cache=mmap that impacted quite a few people so > I'm sending a fix less than a couple of hours after making the commit. > > If it turns out there are other side effects I'd suggest just reverting > commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for > uncached mode too") first, but the fix is rather minimal so I think it's > ok to try falling forward -- let me know if you prefer a revert and I'll > send one instead (there's a minor conflict) See the reply to the original patch [1] (posted right after, and before seeing, this message) - there is indeed more side effects, and I wouldn't mind a revert for now. 0172a934747f ("fs/9p: Invalidate dentry if inode type change detected in cached mode") will need to be reverted too. [1]: https://lore.kernel.org/v9fs/6c74ad63-3afc-4549-9ac6-494b9a63e839@maowtm.org/
Tingmao Wang wrote on Wed, Oct 22, 2025 at 12:34:13AM +0100: > On 10/22/25 00:16, Dominique Martinet wrote: > > We had a regression with cache=mmap that impacted quite a few people so > > I'm sending a fix less than a couple of hours after making the commit. > > > > If it turns out there are other side effects I'd suggest just reverting > > commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for > > uncached mode too") first, but the fix is rather minimal so I think it's > > ok to try falling forward -- let me know if you prefer a revert and I'll > > send one instead (there's a minor conflict) > > See the reply to the original patch [1] (posted right after, and before > seeing, this message) - there is indeed more side effects, and I wouldn't > mind a revert for now. 0172a934747f ("fs/9p: Invalidate dentry if inode > type change detected in cached mode") will need to be reverted too. (yeah, and even that conflicts due to the added debug message in the next commit, but I went the heavy-handed way and removed the conflicting hunk so that commit's also implicitly been removed. In hindsight it would have been cleaner to revert the three commits 290434474c33^..c667c54c5875 -- ohwell) > [1]: https://lore.kernel.org/v9fs/6c74ad63-3afc-4549-9ac6-494b9a63e839@maowtm.org/ OK, so let's go with the revert for now as I don't have time to look for more corner cases immediately. Linus, here's the new PR: --------- The following changes since commit 211ddde0823f1442e4ad052a2f30f050145ccada: Linux 6.18-rc2 (2025-10-19 15:19:16 -1000) are available in the Git repository at: https://github.com/martinetd/linux tags/9p-for-6.18-rc3-v2 for you to fetch changes up to 43c36a56ccf6d9b07b4b3f4f614756e687dcdc01: Revert "fs/9p: Refresh metadata in d_revalidate for uncached mode too" (2025-10-22 14:25:27 +0900) ---------------------------------------------------------------- Fix 9p cache=mmap regression by revert This reverts the problematic commit instead of trying to fix it in a rush ---------------------------------------------------------------- Dominique Martinet (1): Revert "fs/9p: Refresh metadata in d_revalidate for uncached mode too" fs/9p/vfs_dentry.c | 10 ++-------- fs/9p/vfs_inode.c | 8 +------- fs/9p/vfs_inode_dotl.c | 8 +------- 3 files changed, 4 insertions(+), 22 deletions(-) -- Dominique Martinet | Asmadeus
© 2016 - 2025 Red Hat, Inc.