fs/netfs/direct_write.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
Hi Nicolas,
Does the attached fix your problem?
David
---
netfs: Fix kernel async DIO
Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array. Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators. This can be
triggered through a combination of cifs and a loopback blockdev with
something like:
mount //my/cifs/share /foo
dd if=/dev/zero of=/foo/m0 bs=4K count=1K
losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
echo hello >/dev/loop2046
This causes the following to appear in syslog:
WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
and the write to fail.
Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes. Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.
Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/direct_write.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index eded8afaa60b..42ce53cc216e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
* allocate a sufficiently large bvec array and may shorten the
* request.
*/
- if (async || user_backed_iter(iter)) {
+ if (user_backed_iter(iter)) {
n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
if (n < 0) {
ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
wreq->direct_bv_count = n;
wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
} else {
+ /* If this is a kernel-generated async DIO request,
+ * assume that any resources the iterator points to
+ * (eg. a bio_vec array) will persist till the end of
+ * the op.
+ */
wreq->buffer.iter = *iter;
}
}
David Howells <dhowells@redhat.com> writes:
> netfs: Fix kernel async DIO
>
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
> is supplied with a bio_vec[] array. Currently, because of the async flag,
> this gets passed to netfs_extract_user_iter() which throws a warning and
> fails because it only handles IOVEC and UBUF iterators. This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
>
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
>
> This causes the following to appear in syslog:
>
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
>
> and the write to fail.
>
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
> causes async kernel DIO writes to be handled as userspace writes. Note
> that this change relies on the kernel caller maintaining the existence of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
LGTM. Feel free to add:
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Thanks Christoph and Dave!
On Mon, Jan 06, 2025 at 11:37:24AM +0000, David Howells wrote: > mount //my/cifs/share /foo > dd if=/dev/zero of=/foo/m0 bs=4K count=1K > losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 > echo hello >/dev/loop2046 Can you add a testcase using losetup --direct-io with a file on $TEST_DIR so that we get coverage for ITER_BVEC directio to xfstests?
Hi Christoph
Sorry to contact you again but last time you and David H. help me a lot
with 'kernel async DIO' / 'Losetup Direct I/O breaks BACK-FILE
filesystem on CIFS share (Appears in Linux 6.10 and reproduced on
mainline)'
I don't know if it had already been reported but after building Linux
6.14-rc1 I constat the following behaviour:
'cat' command is going on a loop when I cat a file which reside on cifs
share
And so 'cp' command does the same: it copy the content of a file on cifs
share and loop writing it to the destination
I did test with a file named 'toto' and containing only ascii string
'toto'.
When I started copying it from cifs share to local filesystem, I had to
CTRL+C the copy of this 5 bytes file after some time because the
destination file was using all the filesystem free space and containing
billions of 'toto' lines
Here is an example with cat:
CIFS SHARE is mounted as /mnt/fbx/FBX-24T
CIFS mount options:
grep cifs /proc/mounts
//10.0.10.100/FBX24T /mnt/fbx/FBX-24T cifs
rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=none,upcall_target=app,username=fbx,domain=HOMELAN,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.10.100,file_mode=0666,dir_mode=0755,iocharset=utf8,soft,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,nativesocket,symlink=mfsymlinks,rsize=65536,wsize=65536,bsize=16777216,retrans=1,echo_interval=60,actimeo=1,closetimeo=1
0 0
KERNEL: uname -a
Linux 14RV-SERVER.14rv.lan 6.14.0.1-ast-rc2-amd64 #0 SMP PREEMPT_DYNAMIC
Wed Feb 12 18:23:00 CET 2025 x86_64 GNU/Linux
To be reproduced:
echo toto >/mnt/fbx/FBX-24T/toto
ls -l /mnt/fbx/FBX-24T/toto
-rw-rw-rw- 1 root root 5 20 mars 09:20 /mnt/fbx/FBX-24T/toto
cat /mnt/fbx/FBX-24T/toto
toto
toto
toto
toto
toto
toto
toto
^C
strace cat /mnt/fbx/FBX-24T/toto
execve("/usr/bin/cat", ["cat", "/mnt/fbx/FBX-24T/toto"], 0x7ffc39b41848
/* 19 vars */) = 0
brk(NULL) = 0x55755b1c1000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f55f95d6000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (Aucun fichier ou
dossier de ce type)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v3/libc.so.6", O_RDONLY|O_CLOEXEC)
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v2/libc.so.6", O_RDONLY|O_CLOEXEC)
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) =
-1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1
ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun
fichier ou dossier de ce type)
openat(AT_FDCWD, "haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1
ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun
fichier ou dossier de ce type)
openat(AT_FDCWD,
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD,
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3", 0x7fff25937800, 0)
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD,
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD,
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2", 0x7fff25937800, 0)
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD,
"/usr/local/cuda-12.6/lib64/tls/haswell/x86_64/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/x86_64",
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell",
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64",
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls", 0x7fff25937800,
0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64",
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell",
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64",
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/libc.so.6",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64",
{st_mode=S_IFDIR|S_ISGID|0755, st_size=4570, ...}, 0) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=148466, ...},
AT_EMPTY_PATH) = 0
mmap(NULL, 148466, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f95b1000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC)
= 3
read(3,
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20t\2\0\0\0\0\0"...,
832) = 832
pread64(3,
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784,
64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1922136, ...},
AT_EMPTY_PATH) = 0
pread64(3,
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784,
64) = 784
mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x7f55f93d0000
mmap(0x7f55f93f6000, 1396736, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f55f93f6000
mmap(0x7f55f954b000, 339968, PROT_READ,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f55f954b000
mmap(0x7f55f959e000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f55f959e000
mmap(0x7f55f95a4000, 53072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f55f95a4000
close(3) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f55f93cd000
arch_prctl(ARCH_SET_FS, 0x7f55f93cd740) = 0
set_tid_address(0x7f55f93cda10) = 38427
set_robust_list(0x7f55f93cda20, 24) = 0
rseq(0x7f55f93ce060, 0x20, 0, 0x53053053) = 0
mprotect(0x7f55f959e000, 16384, PROT_READ) = 0
mprotect(0x55754475e000, 4096, PROT_READ) = 0
mprotect(0x7f55f960e000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f55f95b1000, 148466) = 0
getrandom("\x19\x6b\x9e\x55\x7e\x09\x74\x5f", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0x55755b1c1000
brk(0x55755b1e2000) = 0x55755b1e2000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) =
3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3048928, ...},
AT_EMPTY_PATH) = 0
mmap(NULL, 3048928, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f9000000
close(3) = 0
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...},
AT_EMPTY_PATH) = 0
openat(AT_FDCWD, "/mnt/fbx/FBX-24T/toto", O_RDONLY) = 3
newfstatat(3, "", {st_mode=S_IFREG|0666, st_size=5, ...}, AT_EMPTY_PATH)
= 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 16785408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f55f7ffe000
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
) = 16711680
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
) = 16711680
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
) = 16711680
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
) = 16711680
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
) = 16711680
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
) = 16711680
read(3,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16777216) = 16711680
write(1,
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
16711680toto
^Cstrace: Process 38427 detached
<detached ...>
Please let me know if it had already been fixed or reported and if
you're able to reproduce this issue.
Thanks for help
Kind regards
Nicolas Baranger
Hi David
Thanks for the job !
I will buid Linux 6.10 and mainline with the provided change and I'm
comming here as soon as I get results from tests (CET working time).
Thanks again for help in this issue
Nicolas
Le 2025-01-06 12:37, David Howells a écrit :
> Hi Nicolas,
>
> Does the attached fix your problem?
>
> David
> ---
> netfs: Fix kernel async DIO
>
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO
> that
> is supplied with a bio_vec[] array. Currently, because of the async
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning
> and
> fails because it only handles IOVEC and UBUF iterators. This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
>
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
>
> This causes the following to appear in syslog:
>
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50
> netfs_extract_user_iter+0x170/0x250 [netfs]
>
> and the write to fail.
>
> Fix this by removing the check in netfs_unbuffered_write_iter_locked()
> that
> causes async kernel DIO writes to be handled as userspace writes. Note
> that this change relies on the kernel caller maintaining the existence
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes:
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index eded8afaa60b..42ce53cc216e 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> - if (async || user_backed_iter(iter)) {
> + if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> + /* If this is a kernel-generated async DIO request,
> + * assume that any resources the iterator points to
> + * (eg. a bio_vec array) will persist till the end of
> + * the op.
> + */
> wreq->buffer.iter = *iter;
> }
> }
Hi David
As your patch was written on top on linux-next I was required to make
some small modifications to make it work on mainline (6.13-rc6).
The following patch is working fine for me on mainline, but i think it
would be better to wait for your confirmation / validation (or new
patch) before applying it on production.
#-------- PATCH --------#
diff --git a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c
b/linux-6.13-rc6/fs/netfs/direct_write.c
index 88f2adf..94a1ee8 100644
--- a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c
+++ b/linux-6.13-rc6/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
kiocb *iocb, struct iov_iter *
* allocate a sufficiently large bvec array and may
shorten the
* request.
*/
- if (async || user_backed_iter(iter)) {
+ if (user_backed_iter(iter)) {
n = netfs_extract_user_iter(iter, len,
&wreq->iter, 0);
if (n < 0) {
ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
kiocb *iocb, struct iov_iter *
wreq->direct_bv_count = n;
wreq->direct_bv_unpin =
iov_iter_extract_will_pin(iter);
} else {
+ /* If this is a kernel-generated async DIO
request,
+ * assume that any resources the iterator points
to
+ * (eg. a bio_vec array) will persist till the
end of
+ * the op.
+ */
wreq->iter = *iter;
}
#-------- TESTS --------#
Using this patch Linux 6.13-rc6 build with no error and '--direct-io=on'
is working :
18:38:47 root@deb12-lab-10d:~# uname -a
Linux deb12-lab-10d.lab.lan 6.13.0-rc6-amd64 #0 SMP PREEMPT_DYNAMIC Mon
Jan 6 18:14:07 CET 2025 x86_64 GNU/Linux
18:39:29 root@deb12-lab-10d:~# losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
DIO LOG-SEC
/dev/loop2046 0 0 0 0
/mnt/FBX24T/FS-LAN/bckcrypt2046 1 4096
18:39:32 root@deb12-lab-10d:~# dmsetup ls | grep bckcrypt
bckcrypt (254:7)
18:39:55 root@deb12-lab-10d:~# cryptsetup status bckcrypt
/dev/mapper/bckcrypt is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
key location: keyring
device: /dev/loop2046
loop: /mnt/FBX24T/FS-LAN/bckcrypt2046
sector size: 512
offset: 32768 sectors
size: 8589901824 sectors
mode: read/write
18:40:36 root@deb12-lab-10d:~# df -h | egrep 'cifs|bckcrypt'
//10.0.10.100/FBX24T cifs 22T 13T 9,0T 60% /mnt/FBX24T
/dev/mapper/bckcrypt btrfs 4,0T 3,3T 779G 82%
/mnt/bckcrypt
09:08:44 root@deb12-lab-10d:~# LANG=en_US.UTF-8
09:08:46 root@deb12-lab-10d:~# dd if=/dev/zero
of=/mnt/bckcrypt/test/test.dd bs=256M count=16 oflag=direct
status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14 s, 302 MB/s
16+0 records in
16+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14.2061 s, 302 MB/s
No write errors using '--direct-io=on' option of losetup with this patch
=> writing to the back-file is more than 20x faster ...
It seems to be ok !
Let me know if something's wrong in this patch or if it can safely be
used in production.
Again thanks everyone for help.
Nicolas
Le 2025-01-06 13:07, nicolas.baranger@3xo.fr a écrit :
> Hi David
>
> Thanks for the job !
> I will buid Linux 6.10 and mainline with the provided change and I'm
> comming here as soon as I get results from tests (CET working time).
>
> Thanks again for help in this issue
> Nicolas
>
> Le 2025-01-06 12:37, David Howells a écrit :
>
>> Hi Nicolas,
>>
>> Does the attached fix your problem?
>>
>> David
>> ---
>> netfs: Fix kernel async DIO
>>
>> Netfslib needs to be able to handle kernel-initiated asynchronous DIO
>> that
>> is supplied with a bio_vec[] array. Currently, because of the async
>> flag,
>> this gets passed to netfs_extract_user_iter() which throws a warning
>> and
>> fails because it only handles IOVEC and UBUF iterators. This can be
>> triggered through a combination of cifs and a loopback blockdev with
>> something like:
>>
>> mount //my/cifs/share /foo
>> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>> echo hello >/dev/loop2046
>>
>> This causes the following to appear in syslog:
>>
>> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50
>> netfs_extract_user_iter+0x170/0x250 [netfs]
>>
>> and the write to fail.
>>
>> Fix this by removing the check in netfs_unbuffered_write_iter_locked()
>> that
>> causes async kernel DIO writes to be handled as userspace writes.
>> Note
>> that this change relies on the kernel caller maintaining the existence
>> of
>> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>>
>> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
>> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
>> Closes:
>> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Steve French <smfrench@gmail.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: netfs@lists.linux.dev
>> cc: linux-cifs@vger.kernel.org
>> cc: linux-fsdevel@vger.kernel.org
>> ---
>> fs/netfs/direct_write.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
>> index eded8afaa60b..42ce53cc216e 100644
>> --- a/fs/netfs/direct_write.c
>> +++ b/fs/netfs/direct_write.c
>> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
>> kiocb *iocb, struct iov_iter *
>> * allocate a sufficiently large bvec array and may shorten the
>> * request.
>> */
>> - if (async || user_backed_iter(iter)) {
>> + if (user_backed_iter(iter)) {
>> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
>> if (n < 0) {
>> ret = n;
>> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
>> kiocb *iocb, struct iov_iter *
>> wreq->direct_bv_count = n;
>> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
>> } else {
>> + /* If this is a kernel-generated async DIO request,
>> + * assume that any resources the iterator points to
>> + * (eg. a bio_vec array) will persist till the end of
>> + * the op.
>> + */
>> wreq->buffer.iter = *iter;
>> }
>> }
Thanks!
I ported the patch to linus/master (see below) and it looks pretty much the
same as yours, give or take tabs getting converted to spaces.
Could I put you down as a Tested-by?
David
---
netfs: Fix kernel async DIO
Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array. Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators. This can be
triggered through a combination of cifs and a loopback blockdev with
something like:
mount //my/cifs/share /foo
dd if=/dev/zero of=/foo/m0 bs=4K count=1K
losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
echo hello >/dev/loop2046
This causes the following to appear in syslog:
WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
and the write to fail.
Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes. Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.
Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/direct_write.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 173e8b5e6a93..f9421f3e6d37 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
* allocate a sufficiently large bvec array and may shorten the
* request.
*/
- if (async || user_backed_iter(iter)) {
+ if (user_backed_iter(iter)) {
n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
if (n < 0) {
ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
wreq->direct_bv_count = n;
wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
} else {
+ /* If this is a kernel-generated async DIO request,
+ * assume that any resources the iterator points to
+ * (eg. a bio_vec array) will persist till the end of
+ * the op.
+ */
wreq->iter = *iter;
}
Hi David
Sure you can !
Please also note that after building 'linux-next' and applying the first
patch you provide I sucessfully test DIO write (same test process as
before).
It works fine too !
I stay availiable for further testing
Thanks again for help (special thanks to Christoph and David)
Nicolas
Le 2025-01-07 15:49, David Howells a écrit :
> Thanks!
>
> I ported the patch to linus/master (see below) and it looks pretty much
> the
> same as yours, give or take tabs getting converted to spaces.
>
> Could I put you down as a Tested-by?
>
> David
>
> ---
> netfs: Fix kernel async DIO
>
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO
> that
> is supplied with a bio_vec[] array. Currently, because of the async
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning
> and
> fails because it only handles IOVEC and UBUF iterators. This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
>
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
>
> This causes the following to appear in syslog:
>
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50
> netfs_extract_user_iter+0x170/0x250 [netfs]
>
> and the write to fail.
>
> Fix this by removing the check in netfs_unbuffered_write_iter_locked()
> that
> causes async kernel DIO writes to be handled as userspace writes. Note
> that this change relies on the kernel caller maintaining the existence
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes:
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index 173e8b5e6a93..f9421f3e6d37 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> - if (async || user_backed_iter(iter)) {
> + if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> + /* If this is a kernel-generated async DIO request,
> + * assume that any resources the iterator points to
> + * (eg. a bio_vec array) will persist till the end of
> + * the op.
> + */
> wreq->iter = *iter;
> }
© 2016 - 2026 Red Hat, Inc.