fs/nfs/read.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Sometimes, when a file was read while it was being truncated by
another NFS client, the kernel could deadlock because folio_unlock()
was called twice, and the second call would XOR back the `PG_locked`
flag.
Most of the time (depending on the timing of the truncation), nobody
notices the problem because folio_unlock() gets called three times,
which flips `PG_locked` back off:
1. vfs_read, nfs_read_folio, ... nfs_read_add_folio,
nfs_return_empty_folio
2. vfs_read, nfs_read_folio, ... netfs_read_collection,
netfs_unlock_abandoned_read_pages
3. vfs_read, ... nfs_do_read_folio, nfs_read_add_folio,
nfs_return_empty_folio
The problem is that nfs_read_add_folio() is not supposed to unlock the
folio if fscache is enabled, and a nfs_netfs_folio_unlock() check is
missing in nfs_return_empty_folio().
Rarely this leads to a warning in netfs_read_collection():
------------[ cut here ]------------
R=0000031c: folio 10 is not locked
WARNING: CPU: 0 PID: 29 at fs/netfs/read_collect.c:133 netfs_read_collection+0x7c0/0xf00
[...]
Workqueue: events_unbound netfs_read_collection_worker
RIP: 0010:netfs_read_collection+0x7c0/0xf00
[...]
Call Trace:
<TASK>
netfs_read_collection_worker+0x67/0x80
process_one_work+0x12e/0x2c0
worker_thread+0x295/0x3a0
Most of the time, however, processes just get stuck forever in
folio_wait_bit_common(), waiting for `PG_locked` to disappear, which
never happens because nobody is really holding the folio lock.
Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
fs/nfs/read.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 81bd1b9aba17..3c1fa320b3f1 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -56,7 +56,8 @@ static int nfs_return_empty_folio(struct folio *folio)
{
folio_zero_segment(folio, 0, folio_size(folio));
folio_mark_uptodate(folio);
- folio_unlock(folio);
+ if (nfs_netfs_folio_unlock(folio))
+ folio_unlock(folio);
return 0;
}
--
2.47.2
On Wed, Apr 23, 2025 at 9:23 AM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> Sometimes, when a file was read while it was being truncated by
> another NFS client, the kernel could deadlock because folio_unlock()
> was called twice, and the second call would XOR back the `PG_locked`
> flag.
>
> Most of the time (depending on the timing of the truncation), nobody
> notices the problem because folio_unlock() gets called three times,
> which flips `PG_locked` back off:
>
> 1. vfs_read, nfs_read_folio, ... nfs_read_add_folio,
> nfs_return_empty_folio
> 2. vfs_read, nfs_read_folio, ... netfs_read_collection,
> netfs_unlock_abandoned_read_pages
> 3. vfs_read, ... nfs_do_read_folio, nfs_read_add_folio,
> nfs_return_empty_folio
>
> The problem is that nfs_read_add_folio() is not supposed to unlock the
> folio if fscache is enabled, and a nfs_netfs_folio_unlock() check is
> missing in nfs_return_empty_folio().
>
> Rarely this leads to a warning in netfs_read_collection():
>
> ------------[ cut here ]------------
> R=0000031c: folio 10 is not locked
> WARNING: CPU: 0 PID: 29 at fs/netfs/read_collect.c:133 netfs_read_collection+0x7c0/0xf00
> [...]
> Workqueue: events_unbound netfs_read_collection_worker
> RIP: 0010:netfs_read_collection+0x7c0/0xf00
> [...]
> Call Trace:
> <TASK>
> netfs_read_collection_worker+0x67/0x80
> process_one_work+0x12e/0x2c0
> worker_thread+0x295/0x3a0
>
> Most of the time, however, processes just get stuck forever in
> folio_wait_bit_common(), waiting for `PG_locked` to disappear, which
> never happens because nobody is really holding the folio lock.
>
> Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> fs/nfs/read.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 81bd1b9aba17..3c1fa320b3f1 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -56,7 +56,8 @@ static int nfs_return_empty_folio(struct folio *folio)
> {
> folio_zero_segment(folio, 0, folio_size(folio));
> folio_mark_uptodate(folio);
> - folio_unlock(folio);
> + if (nfs_netfs_folio_unlock(folio))
> + folio_unlock(folio);
> return 0;
> }
>
> --
> 2.47.2
>
LGTM
Thanks for tracking this down, Max.
Reviewed-by: Dave Wysochanski <dwysocha@redhat.com>
© 2016 - 2025 Red Hat, Inc.