mm/filemap.c | 6 ------ mm/shmem.c | 31 ++++++++++++++++++++----------- 2 files changed, 20 insertions(+), 17 deletions(-)
Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing
when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change.
Whilst nfsd_splice_action() does contain some questionable handling of
repeated pages, and Chuck was able to work around there, history from
Mark Hemment makes clear that there might be similar dangers elsewhere:
it was not a good idea for me to pass ZERO_PAGE down to unknown actors.
Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when
iter_is_iovec(); in other cases, use the more natural iov_iter_zero()
instead of copy_page_to_iter(). We would use iov_iter_zero() throughout,
but the x86 clear_user() is not nearly so well optimized as copy to user
(dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds).
And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)):
which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards
Reported-by: Patrice CHOTARD <patrice.chotard@foss.st.com>
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: 56a8c8eb1eaf ("tmpfs: do not allocate pages on read")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Chuck Lever III <chuck.lever@oracle.com>
---
mm/filemap.c | 6 ------
mm/shmem.c | 31 ++++++++++++++++++++-----------
2 files changed, 20 insertions(+), 17 deletions(-)
--- 5.18-rc1/mm/filemap.c
+++ linux/mm/filemap.c
@@ -1063,12 +1063,6 @@ void __init pagecache_init(void)
init_waitqueue_head(&folio_wait_table[i]);
page_writeback_init();
-
- /*
- * tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date,
- * and splice's page_cache_pipe_buf_confirm() needs to see that.
- */
- SetPageUptodate(ZERO_PAGE(0));
}
/*
--- 5.18-rc1/mm/shmem.c
+++ linux/mm/shmem.c
@@ -2513,7 +2513,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
pgoff_t end_index;
unsigned long nr, ret;
loff_t i_size = i_size_read(inode);
- bool got_page;
end_index = i_size >> PAGE_SHIFT;
if (index > end_index)
@@ -2570,24 +2569,34 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
*/
if (!offset)
mark_page_accessed(page);
- got_page = true;
+ /*
+ * Ok, we have the page, and it's up-to-date, so
+ * now we can copy it to user space...
+ */
+ ret = copy_page_to_iter(page, offset, nr, to);
+ put_page(page);
+
+ } else if (iter_is_iovec(to)) {
+ /*
+ * Copy to user tends to be so well optimized, but
+ * clear_user() not so much, that it is noticeably
+ * faster to copy the zero page instead of clearing.
+ */
+ ret = copy_page_to_iter(ZERO_PAGE(0), offset, nr, to);
} else {
- page = ZERO_PAGE(0);
- got_page = false;
+ /*
+ * But submitting the same page twice in a row to
+ * splice() - or others? - can result in confusion:
+ * so don't attempt that optimization on pipes etc.
+ */
+ ret = iov_iter_zero(nr, to);
}
- /*
- * Ok, we have the page, and it's up-to-date, so
- * now we can copy it to user space...
- */
- ret = copy_page_to_iter(page, offset, nr, to);
retval += ret;
offset += ret;
index += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;
- if (got_page)
- put_page(page);
if (!iov_iter_count(to))
break;
if (ret < nr) {
On Fri, Apr 08, 2022 at 01:38:41PM -0700, Hugh Dickins wrote:
> + } else if (iter_is_iovec(to)) {
> + /*
> + * Copy to user tends to be so well optimized, but
> + * clear_user() not so much, that it is noticeably
> + * faster to copy the zero page instead of clearing.
> + */
> + ret = copy_page_to_iter(ZERO_PAGE(0), offset, nr, to);
Is the offset and length guaranteed to be less than PAGE_SIZE here?
Either way I'd rather do this optimization in iov_iter_zero rather
than hiding it in tmpfs.
On Sat, 9 Apr 2022, Christoph Hellwig wrote:
> On Fri, Apr 08, 2022 at 01:38:41PM -0700, Hugh Dickins wrote:
> > + } else if (iter_is_iovec(to)) {
> > + /*
> > + * Copy to user tends to be so well optimized, but
> > + * clear_user() not so much, that it is noticeably
> > + * faster to copy the zero page instead of clearing.
> > + */
> > + ret = copy_page_to_iter(ZERO_PAGE(0), offset, nr, to);
>
> Is the offset and length guaranteed to be less than PAGE_SIZE here?
Almost :) The offset is guaranteed to be less than PAGE_SIZE here, and
the length is guaranteed to be less than or equal to PAGE_SIZE - offset.
>
> Either way I'd rather do this optimization in iov_iter_zero rather
> than hiding it in tmpfs.
Let's see what others say. I think we would all prefer clear_user() to be
enhanced, and hack around it neither here in tmpfs nor in iov_iter_zero().
But that careful work won't get done by magic, nor by me.
And iov_iter_zero() has to deal with a wider range of possibilities,
when pulling in cache lines of ZERO_PAGE(0) will be less advantageous,
than in tmpfs doing a large dd - the case I'm aiming not to regress here
(tmpfs has been copying ZERO_PAGE(0) like this for years).
Hugh
On Fri, 8 Apr 2022 23:08:29 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > > > Either way I'd rather do this optimization in iov_iter_zero rather > > than hiding it in tmpfs. > > Let's see what others say. I think we would all prefer clear_user() to be > enhanced, and hack around it neither here in tmpfs nor in iov_iter_zero(). > But that careful work won't get done by magic, nor by me. > > And iov_iter_zero() has to deal with a wider range of possibilities, > when pulling in cache lines of ZERO_PAGE(0) will be less advantageous, > than in tmpfs doing a large dd - the case I'm aiming not to regress here > (tmpfs has been copying ZERO_PAGE(0) like this for years). We do need something to get 5.18 fixed. Christoph, do you think we should proceed with this patch for 5.18?
On Tue, Apr 12, 2022 at 04:22:21PM -0700, Andrew Morton wrote: > On Fri, 8 Apr 2022 23:08:29 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > > > > > > Either way I'd rather do this optimization in iov_iter_zero rather > > > than hiding it in tmpfs. > > > > Let's see what others say. I think we would all prefer clear_user() to be > > enhanced, and hack around it neither here in tmpfs nor in iov_iter_zero(). > > But that careful work won't get done by magic, nor by me. > > > > And iov_iter_zero() has to deal with a wider range of possibilities, > > when pulling in cache lines of ZERO_PAGE(0) will be less advantageous, > > than in tmpfs doing a large dd - the case I'm aiming not to regress here > > (tmpfs has been copying ZERO_PAGE(0) like this for years). > > We do need something to get 5.18 fixed. Christoph, do you think we > should proceed with this patch for 5.18? Well, let's queue it up then.
On Fri, Apr 08, 2022 at 11:08:29PM -0700, Hugh Dickins wrote: > > > > Either way I'd rather do this optimization in iov_iter_zero rather > > than hiding it in tmpfs. > > Let's see what others say. I think we would all prefer clear_user() to be > enhanced, and hack around it neither here in tmpfs nor in iov_iter_zero(). > But that careful work won't get done by magic, nor by me. I agree with that. > And iov_iter_zero() has to deal with a wider range of possibilities, > when pulling in cache lines of ZERO_PAGE(0) will be less advantageous, > than in tmpfs doing a large dd - the case I'm aiming not to regress here > (tmpfs has been copying ZERO_PAGE(0) like this for years). Maybe. OTOH I'd hate to have iov_iter_zero not used much because it sucks too much. So how can we entice someone with the right knowledge to implement a decent clear_user for x86?
On Tue, Apr 12, 2022 at 06:57:57AM +0200, Christoph Hellwig wrote: > On Fri, Apr 08, 2022 at 11:08:29PM -0700, Hugh Dickins wrote: > > > > > > Either way I'd rather do this optimization in iov_iter_zero rather > > > than hiding it in tmpfs. > > > > Let's see what others say. I think we would all prefer clear_user() to be > > enhanced, and hack around it neither here in tmpfs nor in iov_iter_zero(). > > But that careful work won't get done by magic, nor by me. > > I agree with that. > > > And iov_iter_zero() has to deal with a wider range of possibilities, > > when pulling in cache lines of ZERO_PAGE(0) will be less advantageous, > > than in tmpfs doing a large dd - the case I'm aiming not to regress here > > (tmpfs has been copying ZERO_PAGE(0) like this for years). > > Maybe. OTOH I'd hate to have iov_iter_zero not used much because it > sucks too much. > > So how can we entice someone with the right knowledge to implement a > decent clear_user for x86? Apparently that already happened, but it needs finishing up: https://lore.kernel.org/lkml/Yk9yBcj78mpXOOLL@zx2c4.com/
© 2016 - 2026 Red Hat, Inc.