[PATCH v5 37/40] netfs: Optimise away reads above the point at which there can be no data

David Howells posted 40 patches 2 years ago
[PATCH v5 37/40] netfs: Optimise away reads above the point at which there can be no data
Posted by David Howells 2 years ago
Track the file position above which the server is not expected to have any
data (the "zero point") and preemptively assume that we can satisfy
requests by filling them with zeroes locally rather than attempting to
download them if they're over that line - even if we've written data back
to the server.  Assume that any data that was written back above that
position is held in the local cache.  Note that we have to split requests
that straddle the line.

Make use of this to optimise away some reads from the server.  We need to
set the zero point in the following circumstances:

 (1) When we see an extant remote inode and have no cache for it, we set
     the zero_point to i_size.

 (2) On local inode creation, we set zero_point to 0.

 (3) On local truncation down, we reduce zero_point to the new i_size if
     the new i_size is lower.

 (4) On local truncation up, we don't change zero_point.

 (5) On local modification, we don't change zero_point.

 (6) On remote invalidation, we set zero_point to the new i_size.

 (7) If stored data is discarded from the pagecache or culled from fscache,
     we must set zero_point above that if the data also got written to the
     server.

 (8) If dirty data is written back to the server, but not fscache, we must
     set zero_point above that.

 (9) If a direct I/O write is made, set zero_point above that.

Assuming the above, any read from the server at or above the zero_point
position will return all zeroes.

The zero_point value can be stored in the cache, provided the above rules
are applied to it by any code that culls part of the local cache.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---
 fs/afs/inode.c            | 22 +++++++++++++---------
 fs/netfs/buffered_write.c |  2 +-
 fs/netfs/direct_write.c   |  4 ++++
 fs/netfs/io.c             | 10 ++++++++++
 fs/netfs/misc.c           |  5 +++++
 fs/smb/client/cifsfs.c    |  4 ++--
 include/linux/netfs.h     | 14 ++++++++++++--
 7 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 37485ae31471..2b44a342b4a1 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -168,6 +168,7 @@ static void afs_apply_status(struct afs_operation *op,
 	struct inode *inode = &vnode->netfs.inode;
 	struct timespec64 t;
 	umode_t mode;
+	bool unexpected_jump = false;
 	bool data_changed = false;
 	bool change_size = vp->set_size;
 
@@ -231,6 +232,7 @@ static void afs_apply_status(struct afs_operation *op,
 		}
 		change_size = true;
 		data_changed = true;
+		unexpected_jump = true;
 	} else if (vnode->status.type == AFS_FTYPE_DIR) {
 		/* Expected directory change is handled elsewhere so
 		 * that we can locally edit the directory and save on a
@@ -252,6 +254,8 @@ static void afs_apply_status(struct afs_operation *op,
 		vnode->netfs.remote_i_size = status->size;
 		if (change_size || status->size > i_size_read(inode)) {
 			afs_set_i_size(vnode, status->size);
+			if (unexpected_jump)
+				vnode->netfs.zero_point = status->size;
 			inode_set_ctime_to_ts(inode, t);
 			inode_set_atime_to_ts(inode, t);
 		}
@@ -865,17 +869,17 @@ static void afs_setattr_success(struct afs_operation *op)
 static void afs_setattr_edit_file(struct afs_operation *op)
 {
 	struct afs_vnode_param *vp = &op->file[0];
-	struct inode *inode = &vp->vnode->netfs.inode;
+	struct afs_vnode *vnode = vp->vnode;
 
 	if (op->setattr.attr->ia_valid & ATTR_SIZE) {
 		loff_t size = op->setattr.attr->ia_size;
 		loff_t i_size = op->setattr.old_i_size;
 
-		if (size < i_size)
-			truncate_pagecache(inode, size);
-		if (size != i_size)
-			fscache_resize_cookie(afs_vnode_cache(vp->vnode),
-					      vp->scb.status.size);
+		if (size != i_size) {
+			truncate_setsize(&vnode->netfs.inode, size);
+			netfs_resize_file(&vnode->netfs, size, true);
+			fscache_resize_cookie(afs_vnode_cache(vnode), size);
+		}
 	}
 }
 
@@ -943,11 +947,11 @@ int afs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		 */
 		if (!(attr->ia_valid & (supported & ~ATTR_SIZE & ~ATTR_MTIME)) &&
 		    attr->ia_size < i_size &&
-		    attr->ia_size > vnode->status.size) {
-			truncate_pagecache(inode, attr->ia_size);
+		    attr->ia_size > vnode->netfs.remote_i_size) {
+			truncate_setsize(inode, attr->ia_size);
+			netfs_resize_file(&vnode->netfs, size, false);
 			fscache_resize_cookie(afs_vnode_cache(vnode),
 					      attr->ia_size);
-			i_size_write(inode, attr->ia_size);
 			ret = 0;
 			goto out_unlock;
 		}
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 6ca6c4bde5eb..08f28800232c 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -73,7 +73,7 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct netfs_inode *ctx,
 	if (folio_test_uptodate(folio))
 		return NETFS_FOLIO_IS_UPTODATE;
 
-	if (pos >= ctx->remote_i_size)
+	if (pos >= ctx->zero_point)
 		return NETFS_MODIFY_AND_CLEAR;
 
 	if (!maybe_trouble && offset == 0 && len >= flen)
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index bb0c2718f57b..aad05f2349a4 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -134,6 +134,7 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 	struct netfs_inode *ictx = netfs_inode(inode);
+	unsigned long long end;
 	ssize_t ret;
 
 	_enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode));
@@ -155,6 +156,9 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	ret = kiocb_invalidate_pages(iocb, iov_iter_count(from));
 	if (ret < 0)
 		goto out;
+	end = iocb->ki_pos + iov_iter_count(from);
+	if (end > ictx->zero_point)
+		ictx->zero_point = end;
 
 	fscache_invalidate(netfs_i_cookie(ictx), NULL, i_size_read(inode),
 			   FSCACHE_INVAL_DIO_WRITE);
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 14c18be5aca0..5b5af96cd4b9 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -569,6 +569,7 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
 			struct iov_iter *io_iter)
 {
 	enum netfs_io_source source = NETFS_DOWNLOAD_FROM_SERVER;
+	struct netfs_inode *ictx = netfs_inode(rreq->inode);
 	size_t lsize;
 
 	_enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size);
@@ -586,6 +587,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
 		 * to make serial calls, it can indicate a short read and then
 		 * we will call it again.
 		 */
+		if (rreq->origin != NETFS_DIO_READ) {
+			if (subreq->start >= ictx->zero_point) {
+				source = NETFS_FILL_WITH_ZEROES;
+				goto set;
+			}
+			if (subreq->len > ictx->zero_point - subreq->start)
+				subreq->len = ictx->zero_point - subreq->start;
+		}
 		if (subreq->len > rreq->i_size - subreq->start)
 			subreq->len = rreq->i_size - subreq->start;
 		if (rreq->rsize && subreq->len > rreq->rsize)
@@ -607,6 +616,7 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
 		}
 	}
 
+set:
 	if (subreq->len > rreq->len)
 		pr_warn("R=%08x[%u] SREQ>RREQ %zx > %zx\n",
 			rreq->debug_id, subreq->debug_index,
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index eeb44abe59c5..0e3af37fc924 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -240,6 +240,11 @@ EXPORT_SYMBOL(netfs_invalidate_folio);
 bool netfs_release_folio(struct folio *folio, gfp_t gfp)
 {
 	struct netfs_inode *ctx = netfs_inode(folio_inode(folio));
+	unsigned long long end;
+
+	end = folio_pos(folio) + folio_size(folio);
+	if (end > ctx->zero_point)
+		ctx->zero_point = end;
 
 	if (folio_test_private(folio))
 		return false;
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 96a65cf9b5ec..07cd88897c33 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1220,7 +1220,7 @@ static int cifs_precopy_set_eof(struct inode *src_inode, struct cifsInodeInfo *s
 	if (rc < 0)
 		goto set_failed;
 
-	netfs_resize_file(&src_cifsi->netfs, src_end);
+	netfs_resize_file(&src_cifsi->netfs, src_end, true);
 	fscache_resize_cookie(cifs_inode_cookie(src_inode), src_end);
 	return 0;
 
@@ -1351,7 +1351,7 @@ static loff_t cifs_remap_file_range(struct file *src_file, loff_t off,
 			smb_file_src, smb_file_target, off, len, destoff);
 		if (rc == 0 && new_size > i_size_read(target_inode)) {
 			truncate_setsize(target_inode, new_size);
-			netfs_resize_file(&target_cifsi->netfs, new_size);
+			netfs_resize_file(&target_cifsi->netfs, new_size, true);
 			fscache_resize_cookie(cifs_inode_cookie(target_inode),
 					      new_size);
 		}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 8cde618cf6d9..a5374218efe4 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -136,6 +136,8 @@ struct netfs_inode {
 	struct fscache_cookie	*cache;
 #endif
 	loff_t			remote_i_size;	/* Size of the remote file */
+	loff_t			zero_point;	/* Size after which we assume there's no data
+						 * on the server */
 	unsigned long		flags;
 #define NETFS_ICTX_ODIRECT	0		/* The file has DIO in progress */
 #define NETFS_ICTX_UNBUFFERED	1		/* I/O should not use the pagecache */
@@ -463,22 +465,30 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
 {
 	ctx->ops = ops;
 	ctx->remote_i_size = i_size_read(&ctx->inode);
+	ctx->zero_point = ctx->remote_i_size;
 	ctx->flags = 0;
 #if IS_ENABLED(CONFIG_FSCACHE)
 	ctx->cache = NULL;
 #endif
+	/* ->releasepage() drives zero_point */
+	mapping_set_release_always(ctx->inode.i_mapping);
 }
 
 /**
  * netfs_resize_file - Note that a file got resized
  * @ctx: The netfs inode being resized
  * @new_i_size: The new file size
+ * @changed_on_server: The change was applied to the server
  *
  * Inform the netfs lib that a file got resized so that it can adjust its state.
  */
-static inline void netfs_resize_file(struct netfs_inode *ctx, loff_t new_i_size)
+static inline void netfs_resize_file(struct netfs_inode *ctx, loff_t new_i_size,
+				     bool changed_on_server)
 {
-	ctx->remote_i_size = new_i_size;
+	if (changed_on_server)
+		ctx->remote_i_size = new_i_size;
+	if (new_i_size < ctx->zero_point)
+		ctx->zero_point = new_i_size;
 }
 
 /**
Re: [PATCH v5 37/40] netfs: Optimise away reads above the point at which there can be no data
Posted by Nathan Chancellor 2 years ago
Hi David,

On Thu, Dec 21, 2023 at 01:23:32PM +0000, David Howells wrote:
> Track the file position above which the server is not expected to have any
> data (the "zero point") and preemptively assume that we can satisfy
> requests by filling them with zeroes locally rather than attempting to
> download them if they're over that line - even if we've written data back
> to the server.  Assume that any data that was written back above that
> position is held in the local cache.  Note that we have to split requests
> that straddle the line.
> 
> Make use of this to optimise away some reads from the server.  We need to
> set the zero point in the following circumstances:
> 
>  (1) When we see an extant remote inode and have no cache for it, we set
>      the zero_point to i_size.
> 
>  (2) On local inode creation, we set zero_point to 0.
> 
>  (3) On local truncation down, we reduce zero_point to the new i_size if
>      the new i_size is lower.
> 
>  (4) On local truncation up, we don't change zero_point.
> 
>  (5) On local modification, we don't change zero_point.
> 
>  (6) On remote invalidation, we set zero_point to the new i_size.
> 
>  (7) If stored data is discarded from the pagecache or culled from fscache,
>      we must set zero_point above that if the data also got written to the
>      server.
> 
>  (8) If dirty data is written back to the server, but not fscache, we must
>      set zero_point above that.
> 
>  (9) If a direct I/O write is made, set zero_point above that.
> 
> Assuming the above, any read from the server at or above the zero_point
> position will return all zeroes.
> 
> The zero_point value can be stored in the cache, provided the above rules
> are applied to it by any code that culls part of the local cache.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-cachefs@redhat.com
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---

<snip>

> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 8cde618cf6d9..a5374218efe4 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -136,6 +136,8 @@ struct netfs_inode {
>  	struct fscache_cookie	*cache;
>  #endif
>  	loff_t			remote_i_size;	/* Size of the remote file */
> +	loff_t			zero_point;	/* Size after which we assume there's no data
> +						 * on the server */
>  	unsigned long		flags;
>  #define NETFS_ICTX_ODIRECT	0		/* The file has DIO in progress */
>  #define NETFS_ICTX_UNBUFFERED	1		/* I/O should not use the pagecache */
> @@ -463,22 +465,30 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
>  {
>  	ctx->ops = ops;
>  	ctx->remote_i_size = i_size_read(&ctx->inode);
> +	ctx->zero_point = ctx->remote_i_size;
>  	ctx->flags = 0;
>  #if IS_ENABLED(CONFIG_FSCACHE)
>  	ctx->cache = NULL;
>  #endif
> +	/* ->releasepage() drives zero_point */
> +	mapping_set_release_always(ctx->inode.i_mapping);
>  }

I bisected a crash that I see when trying to mount an NFS volume to this
change as commit 6e3c8451f624 ("netfs: Optimise away reads above the
point at which there can be no data") in next-20231221:

  [   45.964963] BUG: kernel NULL pointer dereference, address: 0000000000000078
  [   45.964975] #PF: supervisor write access in kernel mode
  [   45.964982] #PF: error_code(0x0002) - not-present page
  [   45.964987] PGD 0 P4D 0
  [   45.964996] Oops: 0002 [#1] PREEMPT SMP NOPTI
  [   45.965004] CPU: 2 PID: 2419 Comm: mount.nfs Not tainted 6.7.0-rc6-next-20231221-debug-09925-g857647efa9be #1 adbbe7bc5037c662bc8f9b8e78ccf16be15b5e58
  [   45.965014] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020
  [   45.965019] RIP: 0010:nfs_alloc_inode+0xa2/0xc0 [nfs]
  [   45.965092] Code: 80 b0 01 00 00 00 00 00 00 48 c7 80 38 04 00 00 00 f7 1e c2 48 c7 80 58 04 00 00 00 00 00 00 48 c7 80 40 04 00 00 00 00 00 00 <f0> 80 0a 80 48 05 b8 01 00 00 e9 5f 2b 20 f5 66 66 2e 0f 1f 84 00
  [   45.965099] RSP: 0018:ffffc900058f7bc0 EFLAGS: 00010286
  [   45.965107] RAX: ffff8881958c7290 RBX: ffff888168f0f800 RCX: 0000000000000000
  [   45.965112] RDX: 0000000000000078 RSI: ffffffffc2140a71 RDI: ffff88817a12b880
  [   45.965118] RBP: ffff888168f0f800 R08: ffffc900058f7b70 R09: 88728c958188ffff
  [   45.965123] R10: 000000000003a5c0 R11: 0000000000000005 R12: ffffffffc22f1a80
  [   45.965128] R13: ffffc900058f7c30 R14: 0000000000000000 R15: 0000000000000002
  [   45.965134] FS:  00007ff78c318740(0000) GS:ffff8887ff280000(0000) knlGS:0000000000000000
  [   45.965140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   45.965146] CR2: 0000000000000078 CR3: 000000018a514000 CR4: 0000000000350ef0
  [   45.965152] Call Trace:
  [   45.965160]  <TASK>
  [   45.965167]  ? __die+0x23/0x70
  [   45.965183]  ? page_fault_oops+0x173/0x4e0
  [   45.965197]  ? nfs_alloc_inode+0x21/0xc0 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965279]  ? exc_page_fault+0x7e/0x180
  [   45.965291]  ? asm_exc_page_fault+0x26/0x30
  [   45.965308]  ? nfs_alloc_inode+0x21/0xc0 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965374]  ? nfs_alloc_inode+0xa2/0xc0 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965441]  alloc_inode+0x1e/0xc0
  [   45.965452]  ? __pfx_nfs_find_actor+0x10/0x10 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965517]  iget5_locked+0x97/0xf0
  [   45.965525]  ? __pfx_nfs_init_locked+0x10/0x10 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965593]  nfs_fhget+0xe4/0x700 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965666]  nfs_get_root+0xc6/0x4a0 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965732]  ? kernfs_rename_ns+0x85/0x210
  [   45.965754]  nfs_get_tree_common+0xc7/0x520 [nfs aac4a012b174ef6e5996d0df3638a0616e82eb47]
  [   45.965826]  vfs_get_tree+0x29/0xf0
  [   45.965836]  fc_mount+0x12/0x40
  [   45.965846]  do_nfs4_mount+0x12e/0x370 [nfsv4 9bac1f2bd94d7294fbbaf875b7b5cec5adc527f5]
  [   45.965946]  nfs4_try_get_tree+0x48/0xd0 [nfsv4 9bac1f2bd94d7294fbbaf875b7b5cec5adc527f5]
  [   45.966034]  vfs_get_tree+0x29/0xf0
  [   45.966041]  ? srso_return_thunk+0x5/0x5f
  [   45.966051]  path_mount+0x4ca/0xb10
  [   45.966063]  __x64_sys_mount+0x11a/0x150
  [   45.966074]  do_syscall_64+0x64/0xe0
  [   45.966083]  ? do_syscall_64+0x70/0xe0
  [   45.966090]  ? syscall_exit_to_user_mode+0x2b/0x40
  [   45.966098]  ? srso_return_thunk+0x5/0x5f
  [   45.966106]  ? do_syscall_64+0x70/0xe0
  [   45.966113]  ? srso_return_thunk+0x5/0x5f
  [   45.966121]  ? exc_page_fault+0x7e/0x180
  [   45.966130]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
  [   45.966138] RIP: 0033:0x7ff78c5f2a1e
  ...

It appears that ctx->inode.i_mapping is NULL in netfs_inode_init(). This
patch appears to cure the problem for me but I am not sure if it is
proper or not.

Cheers,
Nathan

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index a5374218efe4..8daaba665421 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -471,7 +471,8 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
 	ctx->cache = NULL;
 #endif
 	/* ->releasepage() drives zero_point */
-	mapping_set_release_always(ctx->inode.i_mapping);
+	if (ctx->inode.i_mapping)
+		mapping_set_release_always(ctx->inode.i_mapping);
 }
 
 /**
Re: [PATCH v5 37/40] netfs: Optimise away reads above the point at which there can be no data
Posted by David Howells 2 years ago
Nathan Chancellor <nathan@kernel.org> wrote:

> It appears that ctx->inode.i_mapping is NULL in netfs_inode_init(). This
> patch appears to cure the problem for me but I am not sure if it is
> proper or not.

I'm not sure that's the best way.  It kind of indicates that
nfs_netfs_inode_init() is not being called in the right place - it should
really be called after alloc_inode() has called inode_init_always().

However, mapping_set_release_always() makes ->release_folio() and
->invalidate_folio() always called for an inode's folios, even if PG_private
is not set - the idea being that this allows netfslib to update the
"zero_point" when a page we've written to the server gets invalidated here,
thereby requiring us to go fetch it again.

Now, NFS doesn't make use of this feature and fscache and cachefiles don't use
it directly, so we might not want to call mapping_set_release_always() for
NFS.

I'm not sure NFS can even reliably make use of it unless it's using a lease
unless it gets change notifications from the server.

So I'm thinking of applying your patch but add a comment to say why we're
doing it.  A better way, though, is to move the call to nfs_netfs_inode_init()
and give it a flag to say whether or not we want the facility.

David
[PATCH] Fix oops in NFS
Posted by David Howells 2 years ago
David Howells <dhowells@redhat.com> wrote:

> A better way, though, is to move the call to nfs_netfs_inode_init()
> and give it a flag to say whether or not we want the facility.

Okay, I think I'll fold in the attached change.

David
---
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 55345753ae8d..b66466e97459 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -249,7 +249,7 @@ void v9fs_free_inode(struct inode *inode)
 static void v9fs_set_netfs_context(struct inode *inode)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	netfs_inode_init(&v9inode->netfs, &v9fs_req_ops);
+	netfs_inode_init(&v9inode->netfs, &v9fs_req_ops, true);
 }
 
 int v9fs_init_inode(struct v9fs_session_info *v9ses,
diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c
index 1f656005018e..9c517269ff95 100644
--- a/fs/afs/dynroot.c
+++ b/fs/afs/dynroot.c
@@ -76,7 +76,7 @@ struct inode *afs_iget_pseudo_dir(struct super_block *sb, bool root)
 	/* there shouldn't be an existing inode */
 	BUG_ON(!(inode->i_state & I_NEW));
 
-	netfs_inode_init(&vnode->netfs, NULL);
+	netfs_inode_init(&vnode->netfs, NULL, false);
 	inode->i_size		= 0;
 	inode->i_mode		= S_IFDIR | S_IRUGO | S_IXUGO;
 	if (root) {
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 2b44a342b4a1..381521e9e118 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -58,7 +58,7 @@ static noinline void dump_vnode(struct afs_vnode *vnode, struct afs_vnode *paren
  */
 static void afs_set_netfs_context(struct afs_vnode *vnode)
 {
-	netfs_inode_init(&vnode->netfs, &afs_req_ops);
+	netfs_inode_init(&vnode->netfs, &afs_req_ops, true);
 }
 
 /*
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 3149d79a9dbe..0c25d326afc4 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -574,7 +574,7 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	doutc(fsc->client, "%p\n", &ci->netfs.inode);
 
 	/* Set parameters for the netfs library */
-	netfs_inode_init(&ci->netfs, &ceph_netfs_ops);
+	netfs_inode_init(&ci->netfs, &ceph_netfs_ops, false);
 
 	spin_lock_init(&ci->i_ceph_lock);
 
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 5407ab8c8783..e3cb4923316b 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -80,7 +80,7 @@ static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
 }
 static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
 {
-	netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops);
+	netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops, false);
 }
 extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
 extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index a5374218efe4..06a03dd1aff1 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -456,22 +456,27 @@ static inline struct netfs_inode *netfs_inode(struct inode *inode)
  * netfs_inode_init - Initialise a netfslib inode context
  * @ctx: The netfs inode to initialise
  * @ops: The netfs's operations list
+ * @use_zero_point: True to use the zero_point read optimisation
  *
  * Initialise the netfs library context struct.  This is expected to follow on
  * directly from the VFS inode struct.
  */
 static inline void netfs_inode_init(struct netfs_inode *ctx,
-				    const struct netfs_request_ops *ops)
+				    const struct netfs_request_ops *ops,
+				    bool use_zero_point)
 {
 	ctx->ops = ops;
 	ctx->remote_i_size = i_size_read(&ctx->inode);
-	ctx->zero_point = ctx->remote_i_size;
+	ctx->zero_point = LLONG_MAX;
 	ctx->flags = 0;
 #if IS_ENABLED(CONFIG_FSCACHE)
 	ctx->cache = NULL;
 #endif
 	/* ->releasepage() drives zero_point */
-	mapping_set_release_always(ctx->inode.i_mapping);
+	if (use_zero_point) {
+		ctx->zero_point = ctx->remote_i_size;
+		mapping_set_release_always(ctx->inode.i_mapping);
+	}
 }
 
 /**
Re: [PATCH] Fix oops in NFS
Posted by Matthew Wilcox 1 year, 11 months ago
On Fri, Dec 22, 2023 at 12:00:51PM +0000, David Howells wrote:
> David Howells <dhowells@redhat.com> wrote:
> 
> > A better way, though, is to move the call to nfs_netfs_inode_init()
> > and give it a flag to say whether or not we want the facility.
> 
> Okay, I think I'll fold in the attached change.

This commit (100ccd18bb41 in linux-next 20240104) is bad for me.  After
it, running xfstests gives me first a bunch of errors along these lines:

00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/fs/gfs2/gfs2.ko: Exec format error
00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/fs/zonefs/zonefs.ko: Exec format error
00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/security/keys/encrypted-keys/encrypted-keys.ko: Exec format error

and then later:

00016 generic/001       run fstests generic/001 at 2024-01-05 04:50:46
00017 [not run] this test requires a valid $TEST_DEV
00017 generic/002       run fstests generic/002 at 2024-01-05 04:50:46
00017 [not run] this test requires a valid $TEST_DEV
00017 generic/003       run fstests generic/003 at 2024-01-05 04:50:47
00018 [not run] this test requires a valid $SCRATCH_DEV
...

so I think that's page cache corruption of some kind.
Re: [PATCH] Fix oops in NFS
Posted by David Howells 1 year, 11 months ago
Matthew Wilcox <willy@infradead.org> wrote:

> This commit (100ccd18bb41 in linux-next 20240104) is bad for me.  After
> it, running xfstests gives me first a bunch of errors along these lines:

This may be related to a patch that is in linux-next 20240105, but not
20240104 ("9p: Fix initialisation of netfs_inode for 9p").

David
Re: [PATCH] Fix oops in NFS
Posted by Dominique Martinet 1 year, 11 months ago
Matthew Wilcox wrote on Fri, Jan 05, 2024 at 01:17:36PM +0000:
> host on /host type 9p (rw,relatime,access=client,trans=virtio)

David Howells wrote on Fri, Jan 05, 2024 at 02:33:32PM +0000:
> > This commit (100ccd18bb41 in linux-next 20240104) is bad for me.  After
> > it, running xfstests gives me first a bunch of errors along these lines:
> 
> This may be related to a patch that is in linux-next 20240105, but not
> 20240104 ("9p: Fix initialisation of netfs_inode for 9p").

Yes, you'd be reading zeroes without that patch because the netfs code
thinks the file has 0 size and doesn't bother reading, that'd explain
the exec format error loading other modules...

One thing that surprised me is that this also affects cache=none, I
thought we had different file ops going straight to p9_client_read in
this case?
But turning my brain on this would be the read-only mmap case that we
need to support for execs, which module loading also uses, so this came
biting there alright.

-- 
Dominique Martinet | Asmadeus
Re: [PATCH] Fix oops in NFS
Posted by David Howells 1 year, 11 months ago
Do you have CONFIG_NFS_FSCACHE set?  Are you using a cache?

David
Re: [PATCH] Fix oops in NFS
Posted by David Howells 1 year, 11 months ago
Matthew Wilcox <willy@infradead.org> wrote:

> This commit (100ccd18bb41 in linux-next 20240104) is bad for me.  After
> it, running xfstests gives me first a bunch of errors along these lines:
> 
> 00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/fs/gfs2/gfs2.ko: Exec format error
> 00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/fs/zonefs/zonefs.ko: Exec format error
> 00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/security/keys/encrypted-keys/encrypted-keys.ko: Exec format error
> 
> and then later:
> 
> 00016 generic/001       run fstests generic/001 at 2024-01-05 04:50:46
> 00017 [not run] this test requires a valid $TEST_DEV
> 00017 generic/002       run fstests generic/002 at 2024-01-05 04:50:46
> 00017 [not run] this test requires a valid $TEST_DEV
> 00017 generic/003       run fstests generic/003 at 2024-01-05 04:50:47
> 00018 [not run] this test requires a valid $SCRATCH_DEV
> ...
> 
> so I think that's page cache corruption of some kind.

Is that being run on NFS?  Is /lib on NFS?

David
Re: [PATCH] Fix oops in NFS
Posted by Matthew Wilcox 1 year, 11 months ago
On Fri, Jan 05, 2024 at 10:12:55AM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > This commit (100ccd18bb41 in linux-next 20240104) is bad for me.  After
> > it, running xfstests gives me first a bunch of errors along these lines:
> > 
> > 00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/fs/gfs2/gfs2.ko: Exec format error
> > 00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/fs/zonefs/zonefs.ko: Exec format error
> > 00004 depmod: ERROR: failed to load symbols from /lib/modules/6.7.0-rc7-00037-g100ccd18bb41/kernel/security/keys/encrypted-keys/encrypted-keys.ko: Exec format error
> > 
> > and then later:
> > 
> > 00016 generic/001       run fstests generic/001 at 2024-01-05 04:50:46
> > 00017 [not run] this test requires a valid $TEST_DEV
> > 00017 generic/002       run fstests generic/002 at 2024-01-05 04:50:46
> > 00017 [not run] this test requires a valid $TEST_DEV
> > 00017 generic/003       run fstests generic/003 at 2024-01-05 04:50:47
> > 00018 [not run] this test requires a valid $SCRATCH_DEV
> > ...
> > 
> > so I think that's page cache corruption of some kind.
> 
> Is that being run on NFS?  Is /lib on NFS?

No NFS involvement; this is supposed to be an XFS test ...

/dev/sda on / type ext4 (rw,relatime)
host on /host type 9p (rw,relatime,access=client,trans=virtio)
/dev/sdb on /mnt/test type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

CONFIG_NETFS_SUPPORT=y
# CONFIG_NETFS_STATS is not set
# CONFIG_FSCACHE is not set
CONFIG_NETWORK_FILESYSTEMS=y
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_CEPH_FS is not set
# CONFIG_CIFS is not set
# CONFIG_SMB_SERVER is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_9P_FS=y
# CONFIG_9P_FS_POSIX_ACL is not set
# CONFIG_9P_FS_SECURITY is not set
CONFIG_NLS=y