Documentation/filesystems/netfs_library.rst | 2 +- fs/9p/vfs_addr.c | 11 +- fs/afs/file.c | 30 +- fs/afs/fsclient.c | 9 +- fs/afs/write.c | 4 +- fs/afs/yfsclient.c | 9 +- fs/cachefiles/io.c | 19 +- fs/cachefiles/xattr.c | 34 +- fs/ceph/addr.c | 76 +-- fs/netfs/Makefile | 4 +- fs/netfs/buffered_read.c | 766 ++++++++++++++++---------- fs/netfs/buffered_write.c | 309 +++++------ fs/netfs/direct_read.c | 147 ++++- fs/netfs/internal.h | 43 +- fs/netfs/io.c | 804 ---------------------------- fs/netfs/iterator.c | 50 ++ fs/netfs/main.c | 7 +- fs/netfs/misc.c | 94 ++++ fs/netfs/objects.c | 16 +- fs/netfs/read_collect.c | 544 +++++++++++++++++++ fs/netfs/read_pgpriv2.c | 264 +++++++++ fs/netfs/read_retry.c | 256 +++++++++ fs/netfs/stats.c | 27 +- fs/netfs/write_collect.c | 246 +++------ fs/netfs/write_issue.c | 93 ++-- fs/nfs/fscache.c | 19 +- fs/nfs/fscache.h | 7 +- fs/smb/client/cifsencrypt.c | 144 +---- fs/smb/client/cifsglob.h | 4 +- fs/smb/client/cifssmb.c | 6 +- fs/smb/client/file.c | 96 ++-- fs/smb/client/smb2ops.c | 219 ++++---- fs/smb/client/smb2pdu.c | 27 +- fs/smb/client/smbdirect.c | 82 +-- include/linux/folio_queue.h | 156 ++++++ include/linux/iov_iter.h | 104 ++++ include/linux/netfs.h | 46 +- include/linux/uio.h | 18 + include/trace/events/netfs.h | 144 +++-- lib/iov_iter.c | 240 ++++++++- lib/kunit_iov_iter.c | 259 +++++++++ lib/scatterlist.c | 69 ++- 42 files changed, 3520 insertions(+), 1984 deletions(-) delete mode 100644 fs/netfs/io.c create mode 100644 fs/netfs/read_collect.c create mode 100644 fs/netfs/read_pgpriv2.c create mode 100644 fs/netfs/read_retry.c create mode 100644 include/linux/folio_queue.h
Hey Linus,
/* Summary */
This contains the work to improve read/write performance for the new
netfs library.
The main performance enhancing changes are:
- Define a structure, struct folio_queue, and a new iterator type,
ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
that patch for questions about naming and form.
ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
problem with an xarray is that accessing it requires the use of a
lock (typically the RCU read lock) - and this means that we can't
supply iterate_and_advance() with a step function that might sleep
(crypto for example) without having to drop the lock between
pages. ITER_FOLIOQ is the iterator for a chain of folio_queue
structs, where each folio_queue holds a small list of folios. A
folio_queue struct is a simpler structure than xarray and is not
subject to concurrent manipulation by the VM. folio_queue is used
rather than a bvec[] as it can form lists of indefinite size,
adding to one end and removing from the other on the fly.
- Provide a copy_folio_from_iter() wrapper.
- Make cifs RDMA support ITER_FOLIOQ.
- Use folio queues in the write-side helpers instead of xarrays.
- Add a function to reset the iterator in a subrequest.
- Simplify the write-side helpers to use sheaves to skip gaps rather
than trying to work out where gaps are.
- In afs, make the read subrequests asynchronous, putting them into work
items to allow the next patch to do progressive unlocking/reading.
- Overhaul the read-side helpers to improve performance.
- Fix the caching of a partial block at the end of a file.
- Allow a store to be cancelled.
Then some changes for cifs to make it use folio queues instead of
xarrays for crypto bufferage:
- Use raw iteration functions rather than manually coding iteration when
hashing data.
- Switch to using folio_queue for crypto buffers.
- Remove the xarray bits.
Make some adjustments to the /proc/fs/netfs/stats file such that:
- All the netfs stats lines begin 'Netfs:' but change this to something
a bit more useful.
- Add a couple of stats counters to track the numbers of skips and waits
on the per-inode writeback serialisation lock to make it easier to
check for this as a source of performance loss.
Miscellaneous work:
- Ensure that the sb_writers lock is taken around
vfs_{set,remove}xattr() in the cachefiles code.
- Reduce the number of conditional branches in netfs_perform_write().
- Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
remove cifs_post_modify().
- Move the max_len/max_nr_segs members from netfs_io_subrequest to
netfs_io_request as they're only needed for one subreq at a time.
- Add an 'unknown' source value for tracing purposes.
- Remove NETFS_COPY_TO_CACHE as it's no longer used.
- Set the request work function up front at allocation time.
- Use bh-disabling spinlocks for rreq->lock as cachefiles completion may
be run from block-filesystem DIO completion in softirq context.
- Remove fs/netfs/io.c.
/* Testing */
gcc version 14.2.0 (Debian 14.2.0-3)
Debian clang version 16.0.6 (27+b1)
All patches are based on the vfs-6.11-rc7.fixes merge to bring in prerequisite
fixes in individual filesystems. All of this has been sitting in linux-next. No
build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known merge conflicts.
This has now a merge conflict with main due to some rather late cifs fixes.
This can be resolved by:
git rm fs/netfs/io.c
and then:
diff --cc fs/smb/client/cifssmb.c
index cfae2e918209,04f2a5441a89..d0df0c17b18f
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@@ -1261,16 -1261,6 +1261,14 @@@ openRetry
return rc;
}
+static void cifs_readv_worker(struct work_struct *work)
+{
+ struct cifs_io_subrequest *rdata =
+ container_of(work, struct cifs_io_subrequest, subreq.work);
+
- netfs_subreq_terminated(&rdata->subreq,
- (rdata->result == 0 || rdata->result == -EAGAIN) ?
- rdata->got_bytes : rdata->result, true);
++ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
+}
+
static void
cifs_readv_callback(struct mid_q_entry *mid)
{
@@@ -1323,21 -1306,11 +1321,23 @@@
rdata->result = -EIO;
}
- if (rdata->result == 0 || rdata->result == -EAGAIN)
- iov_iter_advance(&rdata->subreq.io_iter, rdata->got_bytes);
+ if (rdata->result == -ENODATA) {
+ __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
+ rdata->result = 0;
+ } else {
- if (rdata->got_bytes < rdata->actual_len &&
- rdata->subreq.start + rdata->subreq.transferred + rdata->got_bytes ==
- ictx->remote_i_size) {
++ size_t trans = rdata->subreq.transferred + rdata->got_bytes;
++ if (trans < rdata->subreq.len &&
++ rdata->subreq.start + trans == ictx->remote_i_size) {
+ __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
+ rdata->result = 0;
+ }
+ }
+
rdata->credits.value = 0;
+ rdata->subreq.transferred += rdata->got_bytes;
- netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
++ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
+ INIT_WORK(&rdata->subreq.work, cifs_readv_worker);
+ queue_work(cifsiod_wq, &rdata->subreq.work);
release_mid(mid);
add_credits(server, &credits, 0);
}
diff --cc fs/smb/client/smb2pdu.c
index 88dc49d67037,95377bb91950..bb8ecbbe78af
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry
server->credits, server->in_flight,
0, cifs_trace_rw_credits_read_response_clear);
rdata->credits.value = 0;
+ rdata->subreq.transferred += rdata->got_bytes;
- if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size)
- __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
+ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
INIT_WORK(&rdata->subreq.work, smb2_readv_worker);
queue_work(cifsiod_wq, &rdata->subreq.work);
release_mid(mid);
Merge conflicts with other trees
================================
No known merge conflicts.
The following changes since commit 4356ab331c8f0dbed0f683abde345cd5503db1e4:
Merge tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs (2024-09-04 09:33:57 -0700)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.netfs
for you to fetch changes up to 4b40d43d9f951d87ae8dc414c2ef5ae50303a266:
docs: filesystems: corrected grammar of netfs page (2024-09-12 12:20:43 +0200)
Please consider pulling these changes from the signed vfs-6.12.netfs tag.
Thanks!
Christian
----------------------------------------------------------------
vfs-6.12.netfs
----------------------------------------------------------------
Christian Brauner (1):
Merge branch 'netfs-writeback' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into vfs.netfs
David Howells (24):
cachefiles: Fix non-taking of sb_writers around set/removexattr
netfs: Adjust labels in /proc/fs/netfs/stats
netfs: Record contention stats for writeback lock
netfs: Reduce number of conditional branches in netfs_perform_write()
netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
netfs: Reserve netfs_sreq_source 0 as unset/unknown
netfs: Remove NETFS_COPY_TO_CACHE
netfs: Set the request work function upon allocation
netfs: Use bh-disabling spinlocks for rreq->lock
mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
iov_iter: Provide copy_folio_from_iter()
cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
netfs: Use new folio_queue data type and iterator instead of xarray iter
netfs: Provide an iterator-reset function
netfs: Simplify the writeback code
afs: Make read subreqs async
netfs: Speed up buffered reading
netfs: Remove fs/netfs/io.c
cachefiles, netfs: Fix write to partial block at EOF
netfs: Cancel dirty folios that have no storage destination
cifs: Use iterate_and_advance*() routines directly for hashing
cifs: Switch crypto buffer to use a folio_queue rather than an xarray
cifs: Don't support ITER_XARRAY
Dennis Lam (1):
docs: filesystems: corrected grammar of netfs page
Documentation/filesystems/netfs_library.rst | 2 +-
fs/9p/vfs_addr.c | 11 +-
fs/afs/file.c | 30 +-
fs/afs/fsclient.c | 9 +-
fs/afs/write.c | 4 +-
fs/afs/yfsclient.c | 9 +-
fs/cachefiles/io.c | 19 +-
fs/cachefiles/xattr.c | 34 +-
fs/ceph/addr.c | 76 +--
fs/netfs/Makefile | 4 +-
fs/netfs/buffered_read.c | 766 ++++++++++++++++----------
fs/netfs/buffered_write.c | 309 +++++------
fs/netfs/direct_read.c | 147 ++++-
fs/netfs/internal.h | 43 +-
fs/netfs/io.c | 804 ----------------------------
fs/netfs/iterator.c | 50 ++
fs/netfs/main.c | 7 +-
fs/netfs/misc.c | 94 ++++
fs/netfs/objects.c | 16 +-
fs/netfs/read_collect.c | 544 +++++++++++++++++++
fs/netfs/read_pgpriv2.c | 264 +++++++++
fs/netfs/read_retry.c | 256 +++++++++
fs/netfs/stats.c | 27 +-
fs/netfs/write_collect.c | 246 +++------
fs/netfs/write_issue.c | 93 ++--
fs/nfs/fscache.c | 19 +-
fs/nfs/fscache.h | 7 +-
fs/smb/client/cifsencrypt.c | 144 +----
fs/smb/client/cifsglob.h | 4 +-
fs/smb/client/cifssmb.c | 6 +-
fs/smb/client/file.c | 96 ++--
fs/smb/client/smb2ops.c | 219 ++++----
fs/smb/client/smb2pdu.c | 27 +-
fs/smb/client/smbdirect.c | 82 +--
include/linux/folio_queue.h | 156 ++++++
include/linux/iov_iter.h | 104 ++++
include/linux/netfs.h | 46 +-
include/linux/uio.h | 18 +
include/trace/events/netfs.h | 144 +++--
lib/iov_iter.c | 240 ++++++++-
lib/kunit_iov_iter.c | 259 +++++++++
lib/scatterlist.c | 69 ++-
42 files changed, 3520 insertions(+), 1984 deletions(-)
delete mode 100644 fs/netfs/io.c
create mode 100644 fs/netfs/read_collect.c
create mode 100644 fs/netfs/read_pgpriv2.c
create mode 100644 fs/netfs/read_retry.c
create mode 100644 include/linux/folio_queue.h
On Fri, Sep 13, 2024 at 06:56:36PM +0200, Christian Brauner wrote: > Hey Linus, > > /* Summary */ > > This contains the work to improve read/write performance for the new > netfs library. > > The main performance enhancing changes are: > > - Define a structure, struct folio_queue, and a new iterator type, > ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See > that patch for questions about naming and form. > > ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The > problem with an xarray is that accessing it requires the use of a > lock (typically the RCU read lock) - and this means that we can't > supply iterate_and_advance() with a step function that might sleep > (crypto for example) without having to drop the lock between > pages. ITER_FOLIOQ is the iterator for a chain of folio_queue > structs, where each folio_queue holds a small list of folios. A > folio_queue struct is a simpler structure than xarray and is not > subject to concurrent manipulation by the VM. folio_queue is used > rather than a bvec[] as it can form lists of indefinite size, > adding to one end and removing from the other on the fly. <...> > David Howells (24): > cachefiles: Fix non-taking of sb_writers around set/removexattr > netfs: Adjust labels in /proc/fs/netfs/stats > netfs: Record contention stats for writeback lock > netfs: Reduce number of conditional branches in netfs_perform_write() > netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode > netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream > netfs: Reserve netfs_sreq_source 0 as unset/unknown > netfs: Remove NETFS_COPY_TO_CACHE > netfs: Set the request work function upon allocation > netfs: Use bh-disabling spinlocks for rreq->lock > mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios > iov_iter: Provide copy_folio_from_iter() > cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs > netfs: Use new folio_queue data type and iterator instead of xarray iter > netfs: Provide an iterator-reset function > netfs: Simplify the writeback code > afs: Make read subreqs async > netfs: Speed up buffered reading > netfs: Remove fs/netfs/io.c > cachefiles, netfs: Fix write to partial block at EOF > netfs: Cancel dirty folios that have no storage destination > cifs: Use iterate_and_advance*() routines directly for hashing > cifs: Switch crypto buffer to use a folio_queue rather than an xarray > cifs: Don't support ITER_XARRAY Christian, David, Do you have fixes for the following issues reported for series? https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ https://lore.kernel.org/all/4b5621958a758da830c1cf09c6f6893aed371f9d.camel@gmail.com/ https://lore.kernel.org/all/20240924094809.GA1182241@unreal/ In my case, I don't have any other workaround but simply revert these commits: "netfs: Use new folio_queue data type and iterator instead of xarray iter" "netfs: Provide an iterator-reset function" "netfs: Simplify the writeback code" "afs: Make read subreqs async" "netfs: Speed up buffered reading" "netfs: Remove fs/netfs/io.c" "cachefiles, netfs: Fix write to partial block at EOF" "netfs: Cancel dirty folios that have no storage destination" "cifs: Use iterate_and_advance*() routines directly for hashing" "cifs: Switch crypto buffer to use a folio_queue rather than an xarray" "cifs: Don't support ITER_XARRAY" "cifs: Fix signature miscalculation" "cifs: Fix cifs readv callback merge resolution issue" "cifs: Remove redundant setting of NETFS_SREQ_HIT_EOF" Thanks
Leon Romanovsky <leon@kernel.org> wrote: > Do you have fixes for the following issues reported for series? > https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ > https://lore.kernel.org/all/4b5621958a758da830c1cf09c6f6893aed371f9d.camel@gmail.com/ > https://lore.kernel.org/all/20240924094809.GA1182241@unreal/ I'm working on a fix for the third one at the moment, I think it's the read version of the write fix I posted previously: https://lore.kernel.org/linux-fsdevel/2050099.1727359110@warthog.procyon.org.uk/ I'm looking to see if I can make a general solution that abstracts out the buffer handling for both as we're early in the cycle. I haven't looked at the first two yet. I was in the middle of testing 9p when I hit the write-side bug that cifs was also seeing (ie. the url above). David
On Fri, Sep 27, 2024 at 09:01:19AM +0100, David Howells wrote: > Leon Romanovsky <leon@kernel.org> wrote: > > > Do you have fixes for the following issues reported for series? > > https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ > > https://lore.kernel.org/all/4b5621958a758da830c1cf09c6f6893aed371f9d.camel@gmail.com/ > > https://lore.kernel.org/all/20240924094809.GA1182241@unreal/ > > I'm working on a fix for the third one at the moment, I think it's the read > version of the write fix I posted previously: > > https://lore.kernel.org/linux-fsdevel/2050099.1727359110@warthog.procyon.org.uk/ > > I'm looking to see if I can make a general solution that abstracts out the > buffer handling for both as we're early in the cycle. I hope that you mean that we have plenty of time before the merge window ends. Otherwise, it will be very inconvenient to open official -next/-rc branches, based on -rc1, remembering to revert so many commits. It is better to have fast fixes and then work on improving them without worrying about time frames. Thanks
Leon Romanovsky <leon@kernel.org> wrote: > I hope that you mean that we have plenty of time before the merge window ends. > Otherwise, it will be very inconvenient to open official -next/-rc branches, > based on -rc1, remembering to revert so many commits. Yes, I'm aware of the time, thank you. Can you please try the branch at: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes see if it fixes your problem? It runs through -g quick with 9p over TCP for me. David
On Fri, Sep 27, 2024 at 09:31:28PM +0100, David Howells wrote: > Leon Romanovsky <leon@kernel.org> wrote: > > > I hope that you mean that we have plenty of time before the merge window ends. > > Otherwise, it will be very inconvenient to open official -next/-rc branches, > > based on -rc1, remembering to revert so many commits. > > Yes, I'm aware of the time, thank you. > > Can you please try the branch at: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes > > see if it fixes your problem? It runs through -g quick with 9p over TCP for > me. I tried it now and it didn't fix my issue. ➜ kernel git:(netfs-fixes) git l 8e18fe180b0a netfs: Abstract out a rolling folio buffer implementation [ 1.506633][ T1] Run /sbin/init as init process [ 1.506800][ T1] with arguments: [ 1.506895][ T1] /sbin/init [ 1.507004][ T1] with environment: [ 1.507104][ T1] HOME=/ [ 1.507201][ T1] TERM=linux [ 1.512981][ T1] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x6ce48 [ 1.513934][ T1] flags: 0x4000000000000000(zone=1) [ 1.514142][ T1] raw: 4000000000000000 ffffea0001b39248 ffffea00001583c8 0000000000000000 [ 1.514536][ T1] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1.514796][ T1] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u)) [ 1.515087][ T1] ------------[ cut here ]------------ [ 1.515231][ T1] kernel BUG at include/linux/mm.h:1444! [ 1.515375][ T1] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN [ 1.515596][ T1] CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0+ #2546 [ 1.515823][ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 1.516153][ T1] RIP: 0010:__iov_iter_get_pages_alloc+0x16d4/0x2210 [ 1.516345][ T1] Code: 84 f2 fa ff ff 48 89 ef e8 c9 20 98 ff e9 e5 fa ff ff 48 8d 48 ff e9 2c fe ff ff 48 c7 c6 00 eb 21 83 48 89 cf e8 fc 25 8a ff <0f> 0b 48 b8 00 00 00 00 00 fc ff df 4c 8b 74 24 68 44 8b 5c 24 30 [ 1.516874][ T1] RSP: 0000:ffff8880060f6e40 EFLAGS: 00010286 [ 1.517126][ T1] RAX: 000000000000005c RBX: ffffea0001b39234 RCX: 0000000000000000 [ 1.517434][ T1] RDX: 000000000000005c RSI: 0000000000000004 RDI: ffffed1000c1edbb [ 1.517676][ T1] RBP: dffffc0000000000 R08: 0000000000000000 R09: fffffbfff0718ce0 [ 1.517898][ T1] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88800950f420 [ 1.518111][ T1] R13: ffff88800aba0c00 R14: 0000000000000002 R15: 0000000000001000 [ 1.518321][ T1] FS: 0000000000000000(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 [ 1.518587][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.518804][ T1] CR2: 0000000000000000 CR3: 0000000003881001 CR4: 0000000000370eb0 [ 1.519058][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.519302][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.519553][ T1] Call Trace: [ 1.519687][ T1] <TASK> [ 1.519790][ T1] ? __die+0x52/0x8f [ 1.519947][ T1] ? die+0x2a/0x50 [ 1.520107][ T1] ? do_trap+0x1d9/0x2c0 [ 1.520227][ T1] ? __iov_iter_get_pages_alloc+0x16d4/0x2210 [ 1.520421][ T1] ? do_error_trap+0xa3/0x160 [ 1.520599][ T1] ? __iov_iter_get_pages_alloc+0x16d4/0x2210 [ 1.520813][ T1] ? handle_invalid_op+0x2c/0x30 [ 1.520991][ T1] ? __iov_iter_get_pages_alloc+0x16d4/0x2210 [ 1.521245][ T1] ? exc_invalid_op+0x29/0x40 [ 1.521438][ T1] ? asm_exc_invalid_op+0x16/0x20 [ 1.521601][ T1] ? __iov_iter_get_pages_alloc+0x16d4/0x2210 [ 1.521799][ T1] ? iov_iter_extract_pages+0x1ee0/0x1ee0 [ 1.521954][ T1] ? radix_tree_node_alloc.constprop.0+0x16a/0x2c0 [ 1.522148][ T1] ? lock_acquire+0xe2/0x500 [ 1.522308][ T1] ? mark_lock+0xfc/0x2dc0 [ 1.522462][ T1] iov_iter_get_pages_alloc2+0x3d/0xe0 [ 1.522624][ T1] ? print_usage_bug.part.0+0x600/0x600 [ 1.522795][ T1] p9_get_mapped_pages.part.0.constprop.0+0x3bf/0x6c0 [ 1.523025][ T1] ? p9pdu_vwritef+0x320/0x1f20 [ 1.523239][ T1] ? p9_virtio_request+0x550/0x550 [ 1.523427][ T1] ? pdu_read+0xc0/0xc0 [ 1.523551][ T1] ? lock_release+0x220/0x780 [ 1.523714][ T1] ? pdu_read+0xc0/0xc0 [ 1.523824][ T1] p9_virtio_zc_request+0x728/0x1020 [ 1.523975][ T1] ? p9pdu_vwritef+0x320/0x1f20 [ 1.524119][ T1] ? p9_virtio_probe+0xa20/0xa20 [ 1.524271][ T1] ? netfs_read_to_pagecache+0x600/0xd90 [ 1.524424][ T1] ? mark_lock+0xfc/0x2dc0 [ 1.524586][ T1] ? p9pdu_finalize+0xdc/0x1d0 [ 1.524762][ T1] ? p9_client_prepare_req+0x235/0x360 [ 1.524918][ T1] ? p9_tag_alloc+0x6e0/0x6e0 [ 1.525084][ T1] ? lock_release+0x220/0x780 [ 1.525241][ T1] p9_client_zc_rpc.constprop.0+0x236/0x7d0 [ 1.525434][ T1] ? __create_object+0x5e/0x80 [ 1.525595][ T1] ? p9_client_flush.isra.0+0x390/0x390 [ 1.525746][ T1] ? lockdep_hardirqs_on_prepare+0x268/0x3e0 [ 1.525911][ T1] ? __call_rcu_common.constprop.0+0x475/0xc80 [ 1.526092][ T1] ? p9_req_put+0x17a/0x200 [ 1.526226][ T1] p9_client_read_once+0x343/0x840 [ 1.526361][ T1] ? p9_client_getlock_dotl+0x3c0/0x3c0 [ 1.526501][ T1] p9_client_read+0xf1/0x150 [ 1.526649][ T1] v9fs_issue_read+0x107/0x300 [ 1.526788][ T1] ? v9fs_issue_write+0x140/0x140 [ 1.526934][ T1] ? __rwlock_init+0x150/0x150 [ 1.527073][ T1] ? lockdep_hardirqs_on_prepare+0x268/0x3e0 [ 1.527249][ T1] netfs_read_to_pagecache+0x600/0xd90 [ 1.527396][ T1] netfs_readahead+0x47a/0x960 [ 1.527552][ T1] read_pages+0x17b/0xaf0 [ 1.527676][ T1] ? lru_move_tail+0x8f0/0x8f0 [ 1.527842][ T1] ? file_ra_state_init+0xd0/0xd0 [ 1.528004][ T1] page_cache_ra_unbounded+0x324/0x5f0 [ 1.528167][ T1] filemap_get_pages+0x597/0x16b0 [ 1.528324][ T1] ? filemap_add_folio+0x140/0x140 [ 1.528476][ T1] ? lock_is_held_type+0x81/0xe0 [ 1.528637][ T1] filemap_read+0x2ec/0xa90 [ 1.528799][ T1] ? filemap_get_pages+0x16b0/0x16b0 [ 1.528969][ T1] ? 0xffffffff81000000 [ 1.529090][ T1] ? find_held_lock+0x2d/0x110 [ 1.529286][ T1] ? lock_is_held_type+0x81/0xe0 [ 1.529437][ T1] ? down_read_interruptible+0x1f6/0x490 [ 1.529601][ T1] ? down_read+0x450/0x450 [ 1.529748][ T1] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 [ 1.529953][ T1] ? find_held_lock+0x2d/0x110 [ 1.530098][ T1] netfs_buffered_read_iter+0xe2/0x130 [ 1.530254][ T1] ? netfs_file_read_iter+0xb2/0x130 [ 1.530398][ T1] __kernel_read+0x2db/0x8a0 [ 1.530554][ T1] ? __x64_sys_lseek+0x1d0/0x1d0 [ 1.530700][ T1] bprm_execve+0x548/0x1410 [ 1.530864][ T1] ? setup_arg_pages+0xb40/0xb40 [ 1.531007][ T1] ? __cond_resched+0x17/0x70 [ 1.531152][ T1] kernel_execve+0x26a/0x2f0 [ 1.531296][ T1] try_to_run_init_process+0xf/0x30 [ 1.531445][ T1] ? rest_init+0x1b0/0x1b0 [ 1.531595][ T1] kernel_init+0xe2/0x140 [ 1.531704][ T1] ? _raw_spin_unlock_irq+0x24/0x30 [ 1.531862][ T1] ret_from_fork+0x2d/0x70 [ 1.532013][ T1] ? rest_init+0x1b0/0x1b0 [ 1.532162][ T1] ret_from_fork_asm+0x11/0x20 [ 1.532342][ T1] </TASK> [ 1.532463][ T1] Modules linked in: [ 1.532679][ T1] ---[ end trace 0000000000000000 ]--- [ 1.532827][ T1] RIP: 0010:__iov_iter_get_pages_alloc+0x16d4/0x2210 [ 1.533109][ T1] Code: 84 f2 fa ff ff 48 89 ef e8 c9 20 98 ff e9 e5 fa ff ff 48 8d 48 ff e9 2c fe ff ff 48 c7 c6 00 eb 21 83 48 89 cf e8 fc 25 8a ff <0f> 0b 48 b8 00 00 00 00 00 fc ff df 4c 8b 74 24 68 44 8b 5c 24 30 [ 1.533633][ T1] RSP: 0000:ffff8880060f6e40 EFLAGS: 00010286 [ 1.533847][ T1] RAX: 000000000000005c RBX: ffffea0001b39234 RCX: 0000000000000000 [ 1.534054][ T1] RDX: 000000000000005c RSI: 0000000000000004 RDI: ffffed1000c1edbb [ 1.534261][ T1] RBP: dffffc0000000000 R08: 0000000000000000 R09: fffffbfff0718ce0 [ 1.534459][ T1] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88800950f420 [ 1.534668][ T1] R13: ffff88800aba0c00 R14: 0000000000000002 R15: 0000000000001000 [ 1.534878][ T1] FS: 0000000000000000(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 [ 1.535129][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.535309][ T1] CR2: 0000000000000000 CR3: 0000000003881001 CR4: 0000000000370eb0 [ 1.535517][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.535734][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.535953][ T1] ------------[ cut here ]------------ > > David >
The pull request you sent on Fri, 13 Sep 2024 18:56:36 +0200: > git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.netfs has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/35219bc5c71f4197c8bd10297597de797c1eece5 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
On Fri, 13 Sept 2024 at 18:57, Christian Brauner <brauner@kernel.org> wrote: > > /* Conflicts */ > > Merge conflicts with mainline Hmm. My conflict resolution is _similar_, but at the same time decidedly different. And I'm not sure why yours is different. > --- a/fs/smb/client/cifssmb.c > +++ b/fs/smb/client/cifssmb.c > @@@ -1261,16 -1261,6 +1261,14 @@@ openRetry > return rc; > } > > +static void cifs_readv_worker(struct work_struct *work) > +{ > + struct cifs_io_subrequest *rdata = > + container_of(work, struct cifs_io_subrequest, subreq.work); > + > - netfs_subreq_terminated(&rdata->subreq, > - (rdata->result == 0 || rdata->result == -EAGAIN) ? > - rdata->got_bytes : rdata->result, true); > ++ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false); So here, I have ++ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, true); with the third argument being 'true' instead of 'false' as in yours. The reason? That's what commit a68c74865f51 ("cifs: Fix SMB1 readv/writev callback in the same way as SMB2/3") did when it moved the (originally) netfs_subreq_terminated() into the worker, and it changed the 'was_async' argument from "false" to a "true". Now, that change makes little sense to me (a worker thread is not softirq context), but that's what commit a68c74865f51 does, and so that's logically what the merge should do. > + rdata->subreq.transferred += rdata->got_bytes; > - netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false); > ++ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress); where did this trace_netfs_sreq() come from? > --- a/fs/smb/client/smb2pdu.c > +++ b/fs/smb/client/smb2pdu.c > @@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry > server->credits, server->in_flight, > 0, cifs_trace_rw_credits_read_response_clear); > rdata->credits.value = 0; > + rdata->subreq.transferred += rdata->got_bytes; > - if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size) > - __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); > + trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress); And where did this conflict resolution come from? I'm not seeing why it removes that NETFS_SREQ_HIT_EOF bit logic.. Some searching gets me this: https://lore.kernel.org/all/1131388.1726141806@warthog.procyon.org.uk/ which seems to explain your merge resolution, and I'm even inclined to think that it might be right, but it's not *sensible*. That whole removal of the NETFS_SREQ_HIT_EOF bit ends up undoing part of commit ee4cdf7ba857 ("netfs: Speed up buffered reading"), and doesn't seem to be supported by the changes done on the other side of the conflict resolution. IOW, the changes may be *correct*, but they do not seem to be valid as a conflict resolution, and if they are fixes they should be done as such separately. Adding DavidH (and Steve French) to the participants, so that they can know about my confusion, and maybe send a patch to fix it up properly with actual explanations. Because I don't want to commit the merge as you suggest without explanations for why those changes were magically done independently of what seems to be happening in the development history. Linus
Linus Torvalds <torvalds@linux-foundation.org> wrote: > > ++ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false); > > So here, I have > > ++ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, true); > > with the third argument being 'true' instead of 'false' as in yours. > > The reason? That's what commit a68c74865f51 ("cifs: Fix SMB1 > readv/writev callback in the same way as SMB2/3") did when it moved > the (originally) netfs_subreq_terminated() into the worker, and it > changed the 'was_async' argument from "false" to a "true". As part of these changes, the callback to netfslib from the SMB1 transport variant is now delegated to a separate worker thread by cifs_readv_callback() rather than being done in the cifs network processing thread (e.g. as is done by the SMB2/3 smb2_readv_worker() in smb2pdu.c), so it's better to pass "false" here. All that argument does is tell netfslib whether it can do cleanup processing and retrying in the calling thread (if "false") or whether it needs to offload it to another thread (if "true"). I should probably rename the argument from "was_async" to something more explanatory. By putting "true" here, it causes the already offloaded processing to further offload unnecessarily. It shouldn't break things though. > > + rdata->subreq.transferred += rdata->got_bytes; > > - netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false); > > ++ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress); > > where did this trace_netfs_sreq() come from? It got copied across with other lines when sync'ing the code with smb2_readv_callback() whilst attempting the merge resolution. It's something that got missed out when porting the changes I'd made to SMB2/3 to SMB1. It should have been deferred to a follow up patch. > > --- a/fs/smb/client/smb2pdu.c > > +++ b/fs/smb/client/smb2pdu.c > > @@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry > > server->credits, server->in_flight, > > 0, cifs_trace_rw_credits_read_response_clear); > > rdata->credits.value = 0; > > + rdata->subreq.transferred += rdata->got_bytes; > > - if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size) > > - __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); > > + trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress); > > And where did this conflict resolution come from? I'm not seeing why > it removes that NETFS_SREQ_HIT_EOF bit logic.. A fix that went upstream via SteveF's tree rather than Christian's tree added NETFS_SREQ_HIT_EOF separately: 1da29f2c39b67b846b74205c81bf0ccd96d34727 netfs, cifs: Fix handling of short DIO read The code that added to twiddle NETFS_SREQ_HIT_EOF is in the source, just above the lines in the hunk above: if (rdata->result == -ENODATA) { __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); rdata->result = 0; } else { size_t trans = rdata->subreq.transferred + rdata->got_bytes; if (trans < rdata->subreq.len && rdata->subreq.start + trans == ictx->remote_i_size) { __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); rdata->result = 0; } } The two lines removed in the example resolution are therefore redundant and should have been removed, but weren't. David
Fix an upstream merge resolution issue[1]. Prior to the netfs read
healpers, the SMB1 asynchronous read callback, cifs_readv_worker()
performed the cleanup for the operation in the network message processing
loop, potentially slowing down the processing of incoming SMB messages.
With commit a68c74865f51 ("cifs: Fix SMB1 readv/writev callback in the same
way as SMB2/3"), this was moved to a worker thread (as is done in the
SMB2/3 transport variant). However, the "was_async" argument to
netfs_subreq_terminated (which was originally incorrectly "false" got
flipped to "true" - which was then incorrect because, being in a kernel
thread, it's not in an async context).
This got corrected in the sample merge[2], but Linus, not unreasonably,
switched it back to its previous value.
Note that this value tells netfslib whether or not it can run sleepable
stuff or stuff that takes a long time, such as retries and cleanups, in the
calling thread, or whether it should offload to a worker thread.
Fix this so that it is "false". The callback to netfslib in both SMB1 and
SMB2/3 now gets offloaded from the network message thread to a separate
worker thread and thus it's fine to do the slow work in this thread.
Fixes: 35219bc5c71f ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
---
fs/smb/client/cifssmb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index d81da161d3ed..7f3b37120a21 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1266,7 +1266,7 @@ static void cifs_readv_worker(struct work_struct *work)
struct cifs_io_subrequest *rdata =
container_of(work, struct cifs_io_subrequest, subreq.work);
- netfs_read_subreq_terminated(&rdata->subreq, rdata->result, true);
+ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
}
static void
On Mon, 16 Sept 2024 at 15:50, David Howells <dhowells@redhat.com> wrote: > > Fix this so that it is "false". The callback to netfslib in both SMB1 and > SMB2/3 now gets offloaded from the network message thread to a separate > worker thread and thus it's fine to do the slow work in this thread. Note that with this fixed, now *every* direct call of netfs_read_subreq_terminated() has that 'was_aync' as false. The exception ends up being the netfs_read_cache_to_pagecache() thing, which does that 'cres->ops->read()' with netfs_read_subreq_terminated() as a callback function. And that callback ends up being done with ki->was_async, which is actually set unconditionally to 'true' (except for the immediate failure case which then sets it to false after all). Could we please just remove that whole 'was_async' case entirely, and just make the cres->ops->read() path just do a workqueue (which seems to be what the true case does anyway)? So then the netfs_read_subreq_terminated() wouldn't need to take that pointless argument, with the only case of async use just fixing itself? Wouldn't that be cleaner? Linus
Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Could we please just remove that whole 'was_async' case entirely, and > just make the cres->ops->read() path just do a workqueue (which seems > to be what the true case does anyway)? > > So then the netfs_read_subreq_terminated() wouldn't need to take that > pointless argument, with the only case of async use just fixing > itself? Wouldn't that be cleaner? It's probably a good idea, but there's also erofs, which also goes through cachefiles_read() with it's own async callback which complicates things a little. David
On Mon, 16 Sept 2024 at 17:33, David Howells <dhowells@redhat.com> wrote: > > It's probably a good idea, but there's also erofs, which also goes through > cachefiles_read() with it's own async callback which complicates things a > little. So I was thinking that if cachefiles_read_complete() would just do the ->term_func() handling as a workqueue thing, that would make this all go away... Linus
Fix an upstream merge resolution issue[1]. The NETFS_SREQ_HIT_EOF flag,
and code to set it, got added via two different paths. The original path
saw it added in the netfslib read improvements[2], but it was also added,
and slightly differently, in a fix that was committed before v6.11:
1da29f2c39b67b846b74205c81bf0ccd96d34727
netfs, cifs: Fix handling of short DIO read
However, the code added to smb2_readv_callback() to set the flag in didn't
get removed when the netfs read improvements series was rebased to take
account of the cifs fixes. The proposed merge resolution[2] deleted it
rather than rebase the patches.
Fix this by removing the redundant lines. Code to set the bit that derives
from the fix patch is still there, a few lines above in the source.
Fixes: 35219bc5c71f ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
---
fs/smb/client/smb2pdu.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 95377bb91950..bb8ecbbe78af 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -4614,8 +4614,6 @@ smb2_readv_callback(struct mid_q_entry *mid)
0, cifs_trace_rw_credits_read_response_clear);
rdata->credits.value = 0;
rdata->subreq.transferred += rdata->got_bytes;
- if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size)
- __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
INIT_WORK(&rdata->subreq.work, smb2_readv_worker);
queue_work(cifsiod_wq, &rdata->subreq.work);
© 2016 - 2024 Red Hat, Inc.