[PATCH] netfs: Fix kernel async DIO

David Howells posted 1 patch 1 year, 1 month ago
There is a newer version of this series
fs/netfs/direct_write.c |    7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
[PATCH] netfs: Fix kernel async DIO
Posted by David Howells 1 year, 1 month ago
Hi Nicolas,

Does the attached fix your problem?

David
---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index eded8afaa60b..42ce53cc216e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->buffer.iter = *iter;
 		}
 	}
Re: [PATCH] netfs: Fix kernel async DIO
Posted by Paulo Alcantara 1 year, 1 month ago
David Howells <dhowells@redhat.com> writes:

> netfs: Fix kernel async DIO
>
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
> is supplied with a bio_vec[] array.  Currently, because of the async flag,
> this gets passed to netfs_extract_user_iter() which throws a warning and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
>
>         mount //my/cifs/share /foo
>         dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>         losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>         echo hello >/dev/loop2046
>
> This causes the following to appear in syslog:
>
>         WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
>
> and the write to fail.
>
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/direct_write.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

LGTM.  Feel free to add:

Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>

Thanks Christoph and Dave!
Re: [PATCH] netfs: Fix kernel async DIO
Posted by Christoph Hellwig 1 year, 1 month ago
On Mon, Jan 06, 2025 at 11:37:24AM +0000, David Howells wrote:
>         mount //my/cifs/share /foo
>         dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>         losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>         echo hello >/dev/loop2046

Can you add a testcase using losetup --direct-io with a file on
$TEST_DIR so that we get coverage for ITER_BVEC directio to xfstests?
[Linux 6.14 - netfs/cifs] loop on file cat + file copy
Posted by Nicolas Baranger 10 months, 3 weeks ago
Hi Christoph

Sorry to contact you again but last time you and David H. help me a lot 
with 'kernel async DIO' / 'Losetup Direct I/O breaks BACK-FILE 
filesystem on CIFS share (Appears in Linux 6.10 and reproduced on 
mainline)'

I don't know if it had already been reported but after building Linux 
6.14-rc1 I constat the following behaviour:

'cat' command is going on a loop when I cat a file which reside on cifs 
share

And so 'cp' command does the same: it copy the content of a file on cifs 
share and loop writing it to the destination
I did test with a file named 'toto' and containing only ascii string 
'toto'.

When I started copying it from cifs share to local filesystem, I had to 
CTRL+C the copy of this 5 bytes file after some time because the 
destination file was using all the filesystem free space and containing 
billions of 'toto' lines

Here is an example with cat:

CIFS SHARE is mounted as /mnt/fbx/FBX-24T

CIFS mount options:
grep cifs /proc/mounts
//10.0.10.100/FBX24T /mnt/fbx/FBX-24T cifs 
rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=none,upcall_target=app,username=fbx,domain=HOMELAN,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.10.100,file_mode=0666,dir_mode=0755,iocharset=utf8,soft,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,nativesocket,symlink=mfsymlinks,rsize=65536,wsize=65536,bsize=16777216,retrans=1,echo_interval=60,actimeo=1,closetimeo=1 
0 0

KERNEL: uname -a
Linux 14RV-SERVER.14rv.lan 6.14.0.1-ast-rc2-amd64 #0 SMP PREEMPT_DYNAMIC 
Wed Feb 12 18:23:00 CET 2025 x86_64 GNU/Linux


To be reproduced:
echo toto >/mnt/fbx/FBX-24T/toto

ls -l /mnt/fbx/FBX-24T/toto
-rw-rw-rw- 1 root root 5 20 mars  09:20 /mnt/fbx/FBX-24T/toto

cat /mnt/fbx/FBX-24T/toto
toto
toto
toto
toto
toto
toto
toto
^C

strace cat /mnt/fbx/FBX-24T/toto
execve("/usr/bin/cat", ["cat", "/mnt/fbx/FBX-24T/toto"], 0x7ffc39b41848 
/* 19 vars */) = 0
brk(NULL)                               = 0x55755b1c1000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f55f95d6000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (Aucun fichier ou 
dossier de ce type)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v3/libc.so.6", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v2/libc.so.6", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun 
fichier ou dossier de ce type)
openat(AT_FDCWD, "haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun 
fichier ou dossier de ce type)
openat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3", 0x7fff25937800, 0) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2", 0x7fff25937800, 0) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/tls/haswell/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls", 0x7fff25937800, 
0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64", 
{st_mode=S_IFDIR|S_ISGID|0755, st_size=4570, ...}, 0) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=148466, ...}, 
AT_EMPTY_PATH) = 0
mmap(NULL, 148466, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f95b1000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) 
= 3
read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20t\2\0\0\0\0\0"..., 
832) = 832
pread64(3, 
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 
64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1922136, ...}, 
AT_EMPTY_PATH) = 0
pread64(3, 
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 
64) = 784
mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f55f93d0000
mmap(0x7f55f93f6000, 1396736, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f55f93f6000
mmap(0x7f55f954b000, 339968, PROT_READ, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f55f954b000
mmap(0x7f55f959e000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f55f959e000
mmap(0x7f55f95a4000, 53072, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f55f95a4000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x7f55f93cd000
arch_prctl(ARCH_SET_FS, 0x7f55f93cd740) = 0
set_tid_address(0x7f55f93cda10)         = 38427
set_robust_list(0x7f55f93cda20, 24)     = 0
rseq(0x7f55f93ce060, 0x20, 0, 0x53053053) = 0
mprotect(0x7f55f959e000, 16384, PROT_READ) = 0
mprotect(0x55754475e000, 4096, PROT_READ) = 0
mprotect(0x7f55f960e000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, 
rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f55f95b1000, 148466)          = 0
getrandom("\x19\x6b\x9e\x55\x7e\x09\x74\x5f", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x55755b1c1000
brk(0x55755b1e2000)                     = 0x55755b1e2000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 
3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3048928, ...}, 
AT_EMPTY_PATH) = 0
mmap(NULL, 3048928, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f9000000
close(3)                                = 0
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}, 
AT_EMPTY_PATH) = 0
openat(AT_FDCWD, "/mnt/fbx/FBX-24T/toto", O_RDONLY) = 3
newfstatat(3, "", {st_mode=S_IFREG|0666, st_size=5, ...}, AT_EMPTY_PATH) 
= 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 16785408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x7f55f7ffe000
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
^Cstrace: Process 38427 detached
  <detached ...>


Please let me know if it had already been fixed or reported and if 
you're able to reproduce this issue.

Thanks for help

Kind regards
Nicolas Baranger
Re: [PATCH] netfs: Fix kernel async DIO
Posted by nicolas.baranger@3xo.fr 1 year, 1 month ago
Hi David

Thanks for the job !
I will buid Linux 6.10 and mainline with the provided change and I'm 
comming here as soon as I get results from tests (CET working time).

Thanks again for help in this issue
Nicolas

Le 2025-01-06 12:37, David Howells a écrit :

> Hi Nicolas,
> 
> Does the attached fix your problem?
> 
> David
> ---
> netfs: Fix kernel async DIO
> 
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
> that
> is supplied with a bio_vec[] array.  Currently, because of the async 
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning 
> and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
> 
> This causes the following to appear in syslog:
> 
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
> netfs_extract_user_iter+0x170/0x250 [netfs]
> 
> and the write to fail.
> 
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
> that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence 
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
> 
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: 
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c |    7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index eded8afaa60b..42ce53cc216e 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> -        if (async || user_backed_iter(iter)) {
> +        if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> +            /* If this is a kernel-generated async DIO request,
> +             * assume that any resources the iterator points to
> +             * (eg. a bio_vec array) will persist till the end of
> +             * the op.
> +             */
> wreq->buffer.iter = *iter;
> }
> }
Re: [PATCH] netfs: Fix kernel async DIO
Posted by nicolas.baranger@3xo.fr 1 year, 1 month ago
Hi David

As your patch was written on top on linux-next I was required to make 
some small modifications to make it work on mainline (6.13-rc6).
The following patch is working fine for me on mainline, but i think it 
would be better to wait for your confirmation / validation (or new 
patch) before applying it on production.

#-------- PATCH --------#

diff --git a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c 
b/linux-6.13-rc6/fs/netfs/direct_write.c
index 88f2adf..94a1ee8 100644
--- a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c
+++ b/linux-6.13-rc6/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
kiocb *iocb, struct iov_iter *
                  * allocate a sufficiently large bvec array and may 
shorten the
                  * request.
                  */
-               if (async || user_backed_iter(iter)) {
+               if (user_backed_iter(iter)) {
                         n = netfs_extract_user_iter(iter, len, 
&wreq->iter, 0);
                         if (n < 0) {
                                 ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
kiocb *iocb, struct iov_iter *
                         wreq->direct_bv_count = n;
                         wreq->direct_bv_unpin = 
iov_iter_extract_will_pin(iter);
                 } else {
+                       /* If this is a kernel-generated async DIO 
request,
+                        * assume that any resources the iterator points 
to
+                        * (eg. a bio_vec array) will persist till the 
end of
+                        * the op.
+                        */
                         wreq->iter = *iter;
                 }


#-------- TESTS --------#

Using this patch Linux 6.13-rc6 build with no error and '--direct-io=on' 
is working :


18:38:47 root@deb12-lab-10d:~# uname -a
Linux deb12-lab-10d.lab.lan 6.13.0-rc6-amd64 #0 SMP PREEMPT_DYNAMIC Mon 
Jan  6 18:14:07 CET 2025 x86_64 GNU/Linux

18:39:29 root@deb12-lab-10d:~# losetup
NAME          SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                    
    DIO LOG-SEC
/dev/loop2046         0      0         0  0 
/mnt/FBX24T/FS-LAN/bckcrypt2046   1    4096

18:39:32 root@deb12-lab-10d:~# dmsetup ls | grep bckcrypt
bckcrypt    (254:7)

18:39:55 root@deb12-lab-10d:~# cryptsetup status bckcrypt
/dev/mapper/bckcrypt is active and is in use.
   type:    LUKS2
   cipher:  aes-xts-plain64
   keysize: 512 bits
   key location: keyring
   device:  /dev/loop2046
   loop:    /mnt/FBX24T/FS-LAN/bckcrypt2046
   sector size:  512
   offset:  32768 sectors
   size:    8589901824 sectors
   mode:    read/write

18:40:36 root@deb12-lab-10d:~# df -h | egrep 'cifs|bckcrypt'
//10.0.10.100/FBX24T      cifs        22T     13T  9,0T  60% /mnt/FBX24T
/dev/mapper/bckcrypt      btrfs      4,0T    3,3T  779G  82% 
/mnt/bckcrypt


09:08:44 root@deb12-lab-10d:~# LANG=en_US.UTF-8
09:08:46 root@deb12-lab-10d:~# dd if=/dev/zero 
of=/mnt/bckcrypt/test/test.dd bs=256M count=16 oflag=direct 
status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14 s, 302 MB/s
16+0 records in
16+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14.2061 s, 302 MB/s



No write errors using '--direct-io=on' option of losetup with this patch 
=> writing to the back-file is more than 20x faster ...
It seems to be ok !

Let me know if something's wrong in this patch or if it can safely be 
used in production.

Again thanks everyone for help.
Nicolas



Le 2025-01-06 13:07, nicolas.baranger@3xo.fr a écrit :

> Hi David
> 
> Thanks for the job !
> I will buid Linux 6.10 and mainline with the provided change and I'm 
> comming here as soon as I get results from tests (CET working time).
> 
> Thanks again for help in this issue
> Nicolas
> 
> Le 2025-01-06 12:37, David Howells a écrit :
> 
>> Hi Nicolas,
>> 
>> Does the attached fix your problem?
>> 
>> David
>> ---
>> netfs: Fix kernel async DIO
>> 
>> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
>> that
>> is supplied with a bio_vec[] array.  Currently, because of the async 
>> flag,
>> this gets passed to netfs_extract_user_iter() which throws a warning 
>> and
>> fails because it only handles IOVEC and UBUF iterators.  This can be
>> triggered through a combination of cifs and a loopback blockdev with
>> something like:
>> 
>> mount //my/cifs/share /foo
>> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>> echo hello >/dev/loop2046
>> 
>> This causes the following to appear in syslog:
>> 
>> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
>> netfs_extract_user_iter+0x170/0x250 [netfs]
>> 
>> and the write to fail.
>> 
>> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
>> that
>> causes async kernel DIO writes to be handled as userspace writes.  
>> Note
>> that this change relies on the kernel caller maintaining the existence 
>> of
>> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>> 
>> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
>> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
>> Closes: 
>> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Steve French <smfrench@gmail.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: netfs@lists.linux.dev
>> cc: linux-cifs@vger.kernel.org
>> cc: linux-fsdevel@vger.kernel.org
>> ---
>> fs/netfs/direct_write.c |    7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
>> index eded8afaa60b..42ce53cc216e 100644
>> --- a/fs/netfs/direct_write.c
>> +++ b/fs/netfs/direct_write.c
>> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
>> kiocb *iocb, struct iov_iter *
>> * allocate a sufficiently large bvec array and may shorten the
>> * request.
>> */
>> -        if (async || user_backed_iter(iter)) {
>> +        if (user_backed_iter(iter)) {
>> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
>> if (n < 0) {
>> ret = n;
>> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
>> kiocb *iocb, struct iov_iter *
>> wreq->direct_bv_count = n;
>> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
>> } else {
>> +            /* If this is a kernel-generated async DIO request,
>> +             * assume that any resources the iterator points to
>> +             * (eg. a bio_vec array) will persist till the end of
>> +             * the op.
>> +             */
>> wreq->buffer.iter = *iter;
>> }
>> }
Re: [PATCH] netfs: Fix kernel async DIO
Posted by David Howells 1 year, 1 month ago
Thanks!

I ported the patch to linus/master (see below) and it looks pretty much the
same as yours, give or take tabs getting converted to spaces.

Could I put you down as a Tested-by?

David

---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 173e8b5e6a93..f9421f3e6d37 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->iter = *iter;
 		}
 
Re: [PATCH] netfs: Fix kernel async DIO
Posted by Nicolas Baranger 1 year, 1 month ago
Hi David

Sure you can !

Please also note that after building 'linux-next' and applying the first 
patch you provide I sucessfully test DIO write (same test process as 
before).
It works fine too !

I stay availiable for further testing

Thanks again for help (special thanks to Christoph and David)
Nicolas



Le 2025-01-07 15:49, David Howells a écrit :

> Thanks!
> 
> I ported the patch to linus/master (see below) and it looks pretty much 
> the
> same as yours, give or take tabs getting converted to spaces.
> 
> Could I put you down as a Tested-by?
> 
> David
> 
> ---
> netfs: Fix kernel async DIO
> 
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
> that
> is supplied with a bio_vec[] array.  Currently, because of the async 
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning 
> and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
> 
> This causes the following to appear in syslog:
> 
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
> netfs_extract_user_iter+0x170/0x250 [netfs]
> 
> and the write to fail.
> 
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
> that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence 
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
> 
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: 
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c |    7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index 173e8b5e6a93..f9421f3e6d37 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> -        if (async || user_backed_iter(iter)) {
> +        if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> +            /* If this is a kernel-generated async DIO request,
> +             * assume that any resources the iterator points to
> +             * (eg. a bio_vec array) will persist till the end of
> +             * the op.
> +             */
> wreq->iter = *iter;
> }