From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B781430674C for ; Tue, 12 May 2026 12:34:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589270; cv=none; b=KflDU7EVTRzaadrBiOlaBXEuZRpdPRwLiMHoxUAmLM0MMSQsrX80aPwfuypTmvGqiD50w/NtnHLn762BuMe+FsachWxd94/yOgD/ZRfch44DslQ00JpNnTnjxfvDruZKbw3R9QbMonHfNXfLYEy20Hy1MyX31R5jPeRiiLk4KAE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589270; c=relaxed/simple; bh=Brs0kwP84QRaV4MHAGIyz9y4w1c4V6+0EEydrI5/MSs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ct/NEjiorJ9u9QNTPSgHX9Rwz5vFQhf/GcQsyJvXiBKTcWTvPbO+wNrj8uiZdXECYV9q6G2b1d6XkI/QgJ5kMehyuVSDRVIO1ZBzrLzIqjwn1JW8OEfCdw3ijWqUGc5JwK9qjl0snh1vUr1d8+g3mUeEuZwhkzTIQGsXabJKmBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NQdR9M/Q; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NQdR9M/Q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nCGquE5C3lrbKnUoOKL/FjbW/HDBNUgjQp9a1As8DkE=; b=NQdR9M/QtOlBThxwCchZ+xOHZigQJFIfSUm0Yi1B6grmj6hsW2Uzg782KU3XTQAjFUavmz eRHFu/XRrNAzv55uaz2NWwnjIgvZIurEsTd4iGXf8S0/+xvIJfrDRVCSMibhsJkTvkfq3e 5GZG0ZLMI85aabnl9g16knxKE0w1JNY= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-426-hys62yCVO-OdT-Iqe8N8Yg-1; Tue, 12 May 2026 08:34:18 -0400 X-MC-Unique: hys62yCVO-OdT-Iqe8N8Yg-1 X-Mimecast-MFC-AGG-ID: hys62yCVO-OdT-Iqe8N8Yg_1778589257 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E9CA7195609F; Tue, 12 May 2026 12:34:15 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B870A1955D84; Tue, 12 May 2026 12:34:12 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 01/24] netfs: Fix cancellation of a DIO and single read subrequests Date: Tue, 12 May 2026 13:33:38 +0100 Message-ID: <20260512123404.719402-2-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" When the preparation of a new subrequest for a read fails, if the subrequest has already been added to the stream->subrequests list, it can't simply be put and abandoned as the collector may see it. Also, if it hasn't been queued yet, it has two outstanding refs that both need to be put. Both DIO read and single-read dispatch fail at this; further, both differ in the order they do things to the way buffered read works. Fix cancellation of both DIO-read and single-read subrequests that failed preparation by the following steps: (1) Harmonise all three reads (buffered, dio, single) to queue the subreq before prepping it. (2) Make all three call netfs_queue_read() to do the queuing. (3) Set NETFS_RREQ_ALL_QUEUED independently of the queuing as we don't know the length of the subreq at this point. (4) In all cases, set the error and NETFS_SREQ_FAILED flag on the subreq and then call netfs_read_subreq_terminated() to deal with it. This will pass responsibility off to the collector for dealing with it. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use o= ne work item") Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 34 +++++++++++++------------------- fs/netfs/direct_read.c | 42 +++++++++++++--------------------------- fs/netfs/internal.h | 3 +++ fs/netfs/read_collect.c | 11 +++++++++++ fs/netfs/read_single.c | 23 ++++++++++------------ 5 files changed, 50 insertions(+), 63 deletions(-) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index a8c0d86118c5..a27ed501b6d4 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -156,9 +156,8 @@ static void netfs_read_cache_to_pagecache(struct netfs_= io_request *rreq, netfs_cache_read_terminated, subreq); } =20 -static void netfs_queue_read(struct netfs_io_request *rreq, - struct netfs_io_subrequest *subreq, - bool last_subreq) +void netfs_queue_read(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) { struct netfs_io_stream *stream =3D &rreq->io_streams[0]; =20 @@ -178,11 +177,6 @@ static void netfs_queue_read(struct netfs_io_request *= rreq, } } =20 - if (last_subreq) { - smp_wmb(); /* Write lists before ALL_QUEUED. */ - set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); - } - spin_unlock(&rreq->lock); } =20 @@ -233,6 +227,8 @@ static void netfs_read_to_pagecache(struct netfs_io_req= uest *rreq, subreq->start =3D start; subreq->len =3D size; =20 + netfs_queue_read(rreq, subreq); + source =3D netfs_cache_prepare_read(rreq, subreq, rreq->i_size); subreq->source =3D source; if (source =3D=3D NETFS_DOWNLOAD_FROM_SERVER) { @@ -253,6 +249,7 @@ static void netfs_read_to_pagecache(struct netfs_io_req= uest *rreq, rreq->debug_id, subreq->debug_index, subreq->len, size, subreq->start, ictx->zero_point, rreq->i_size); + netfs_cancel_read(subreq, ret); break; } subreq->len =3D len; @@ -261,12 +258,7 @@ static void netfs_read_to_pagecache(struct netfs_io_re= quest *rreq, if (rreq->netfs_ops->prepare_read) { ret =3D rreq->netfs_ops->prepare_read(subreq); if (ret < 0) { - subreq->error =3D ret; - /* Not queued - release both refs. */ - netfs_put_subrequest(subreq, - netfs_sreq_trace_put_cancel); - netfs_put_subrequest(subreq, - netfs_sreq_trace_put_cancel); + netfs_cancel_read(subreq, ret); break; } trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); @@ -289,23 +281,23 @@ static void netfs_read_to_pagecache(struct netfs_io_r= equest *rreq, =20 pr_err("Unexpected read source %u\n", source); WARN_ON_ONCE(1); + netfs_cancel_read(subreq, ret); break; =20 issue: slice =3D netfs_prepare_read_iterator(subreq, ractl); if (slice < 0) { ret =3D slice; - subreq->error =3D ret; - trace_netfs_sreq(subreq, netfs_sreq_trace_cancel); - /* Not queued - release both refs. */ - netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel); - netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel); + netfs_cancel_read(subreq, ret); break; } - size -=3D slice; start +=3D slice; + size -=3D slice; + if (size <=3D 0) { + smp_wmb(); /* Write lists before ALL_QUEUED. */ + set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); + } =20 - netfs_queue_read(rreq, subreq, size <=3D 0); netfs_issue_read(rreq, subreq); cond_resched(); } while (size > 0); diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c index f72e6da88cca..6a8fb0d55e04 100644 --- a/fs/netfs/direct_read.c +++ b/fs/netfs/direct_read.c @@ -45,12 +45,11 @@ static void netfs_prepare_dio_read_iterator(struct netf= s_io_subrequest *subreq) * Perform a read to a buffer from the server, slicing up the region to be= read * according to the network rsize. */ -static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq) +static void netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq) { - struct netfs_io_stream *stream =3D &rreq->io_streams[0]; unsigned long long start =3D rreq->start; ssize_t size =3D rreq->len; - int ret =3D 0; + int ret; =20 do { struct netfs_io_subrequest *subreq; @@ -58,7 +57,10 @@ static int netfs_dispatch_unbuffered_reads(struct netfs_= io_request *rreq) =20 subreq =3D netfs_alloc_subrequest(rreq); if (!subreq) { - ret =3D -ENOMEM; + /* Stash the error in the request if there's not + * already an error set. + */ + cmpxchg(&rreq->error, 0, -ENOMEM); break; } =20 @@ -66,25 +68,13 @@ static int netfs_dispatch_unbuffered_reads(struct netfs= _io_request *rreq) subreq->start =3D start; subreq->len =3D size; =20 - __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); - - spin_lock(&rreq->lock); - list_add_tail(&subreq->rreq_link, &stream->subrequests); - if (list_is_first(&subreq->rreq_link, &stream->subrequests)) { - if (!stream->active) { - stream->collected_to =3D subreq->start; - /* Store list pointers before active flag */ - smp_store_release(&stream->active, true); - } - } - trace_netfs_sreq(subreq, netfs_sreq_trace_added); - spin_unlock(&rreq->lock); + netfs_queue_read(rreq, subreq); =20 netfs_stat(&netfs_n_rh_download); if (rreq->netfs_ops->prepare_read) { ret =3D rreq->netfs_ops->prepare_read(subreq); if (ret < 0) { - netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel); + netfs_cancel_read(subreq, ret); break; } } @@ -113,8 +103,6 @@ static int netfs_dispatch_unbuffered_reads(struct netfs= _io_request *rreq) set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); netfs_wake_collector(rreq); } - - return ret; } =20 /* @@ -137,21 +125,17 @@ static ssize_t netfs_unbuffered_read(struct netfs_io_= request *rreq, bool sync) // TODO: Use bounce buffer if requested =20 inode_dio_begin(rreq->inode); + netfs_dispatch_unbuffered_reads(rreq); =20 - ret =3D netfs_dispatch_unbuffered_reads(rreq); - - if (!rreq->submitted) { - netfs_put_request(rreq, netfs_rreq_trace_put_no_submit); - inode_dio_end(rreq->inode); - ret =3D 0; - goto out; - } + /* The collector will get run, even if we don't manage to submit any + * subreqs, so we shouldn't call inode_dio_end() here. + */ =20 if (sync) ret =3D netfs_wait_for_read(rreq); else ret =3D -EIOCBQUEUED; -out: + _leave(" =3D %zd", ret); return ret; } diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index d436e20d3418..645996ecfc80 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -23,6 +23,8 @@ /* * buffered_read.c */ +void netfs_queue_read(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq); void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error); int netfs_prefetch_for_write(struct file *file, struct folio *folio, size_t offset, size_t len); @@ -108,6 +110,7 @@ static inline void netfs_see_subrequest(struct netfs_io= _subrequest *subreq, */ bool netfs_read_collection(struct netfs_io_request *rreq); void netfs_read_collection_worker(struct work_struct *work); +void netfs_cancel_read(struct netfs_io_subrequest *subreq, int error); void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error); =20 /* diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index e5f6665b3341..d2d902f46627 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -575,6 +575,17 @@ void netfs_read_subreq_terminated(struct netfs_io_subr= equest *subreq) } EXPORT_SYMBOL(netfs_read_subreq_terminated); =20 +/* + * Cancel a read subrequest due to preparation failure. + */ +void netfs_cancel_read(struct netfs_io_subrequest *subreq, int error) +{ + trace_netfs_sreq(subreq, netfs_sreq_trace_cancel); + subreq->error =3D error; + __set_bit(NETFS_SREQ_FAILED, &subreq->flags); + netfs_read_subreq_terminated(subreq); +} + /* * Handle termination of a read from the cache. */ diff --git a/fs/netfs/read_single.c b/fs/netfs/read_single.c index d0e23bc42445..8833550d2eb6 100644 --- a/fs/netfs/read_single.c +++ b/fs/netfs/read_single.c @@ -89,7 +89,6 @@ static void netfs_single_read_cache(struct netfs_io_reque= st *rreq, */ static int netfs_single_dispatch_read(struct netfs_io_request *rreq) { - struct netfs_io_stream *stream =3D &rreq->io_streams[0]; struct netfs_io_subrequest *subreq; int ret =3D 0; =20 @@ -102,14 +101,7 @@ static int netfs_single_dispatch_read(struct netfs_io_= request *rreq) subreq->len =3D rreq->len; subreq->io_iter =3D rreq->buffer.iter; =20 - __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); - - spin_lock(&rreq->lock); - list_add_tail(&subreq->rreq_link, &stream->subrequests); - trace_netfs_sreq(subreq, netfs_sreq_trace_added); - /* Store list pointers before active flag */ - smp_store_release(&stream->active, true); - spin_unlock(&rreq->lock); + netfs_queue_read(rreq, subreq); =20 netfs_single_cache_prepare_read(rreq, subreq); switch (subreq->source) { @@ -121,10 +113,14 @@ static int netfs_single_dispatch_read(struct netfs_io= _request *rreq) goto cancel; } =20 + smp_wmb(); /* Write lists before ALL_QUEUED. */ + set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); rreq->netfs_ops->issue_read(subreq); rreq->submitted +=3D subreq->len; break; case NETFS_READ_FROM_CACHE: + smp_wmb(); /* Write lists before ALL_QUEUED. */ + set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); trace_netfs_sreq(subreq, netfs_sreq_trace_submit); netfs_single_read_cache(rreq, subreq); rreq->submitted +=3D subreq->len; @@ -134,14 +130,15 @@ static int netfs_single_dispatch_read(struct netfs_io= _request *rreq) pr_warn("Unexpected single-read source %u\n", subreq->source); WARN_ON_ONCE(true); ret =3D -EIO; - break; + goto cancel; } =20 - smp_wmb(); /* Write lists before ALL_QUEUED. */ - set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); return ret; cancel: - netfs_put_subrequest(subreq, netfs_sreq_trace_put_cancel); + netfs_cancel_read(subreq, ret); + smp_wmb(); /* Write lists before ALL_QUEUED. */ + set_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags); + netfs_wake_collector(rreq); return ret; } From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A639E306748 for ; Tue, 12 May 2026 12:34:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589282; cv=none; b=WBLLV+ol+LpUVFW5AI4tyTrqavfbAndo/uyY2uhA+DQdi3dNE3SlLMto09/ss2DRw80Ajz/Y5u3VYuTgx97+GRiCQhpMO/Hx9D86iQ/IGDb2+//qTrTF8h17jxNSKATEkedupRWDfg1wKxwQD9pLfZYg1ICT1MlScwv+C0RmEC4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589282; c=relaxed/simple; bh=xKMwK6XyTR2S1NYi1ll51KBolxqfDxdt7e/1pyWESn8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bMopRxsK6sOpUM7dgVbcf94ZIELhjUwH8nZdyCQMvXgFKdhi25VBifsO3Z0e/qI9UrayB4X7E81jUZ70Hn0+FE7+nkKyiUbZpF4A/LYPoi6ep3ep23/e7xdqwsnqDPPyQijWJML2fj+Ir9kssvPZAiA1abj/w3G9Lvb8duLCpwU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ESDwvOU9; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ESDwvOU9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589265; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I9C1Tq0bWPjxQKHy18dUxRo+Gdg144k5WV4DnkH5buY=; b=ESDwvOU9H30pCG26xCkdugXSC2NuEzMpQ3Gy1FjknwXEuJ60/wJCTV4cNA4en+SFFLKln/ r+cU6DF0wct7P3WgJvyKplWjjUf4LeNXTvPkpytVLYnYyUlnAoLvAnf/n64zXjvQvYSzzq tGultgAj02XgepFvoOvK5Inl4eeHcyY= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-550-bwxtmLZrM4-IwKg61xVfcA-1; Tue, 12 May 2026 08:34:22 -0400 X-MC-Unique: bwxtmLZrM4-IwKg61xVfcA-1 X-Mimecast-MFC-AGG-ID: bwxtmLZrM4-IwKg61xVfcA_1778589261 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D7C5A19560B1; Tue, 12 May 2026 12:34:20 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 906F21944B0C; Tue, 12 May 2026 12:34:17 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 02/24] netfs: Fix missing locking around retry adding new subreqs Date: Tue, 12 May 2026 13:33:39 +0100 Message-ID: <20260512123404.719402-3-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Fix netfs_retry_read_subrequests() and netfs_retry_write_stream() to take the appropriate lock when adding extra subrequests into stream->subrequests. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use o= ne work item") Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/read_retry.c | 6 +++++- fs/netfs/write_retry.c | 6 +++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c index cca9ac43c077..5ec548b996d6 100644 --- a/fs/netfs/read_retry.c +++ b/fs/netfs/read_retry.c @@ -175,7 +175,9 @@ static void netfs_retry_read_subrequests(struct netfs_i= o_request *rreq) list_for_each_entry_safe_from(subreq, tmp, &stream->subrequests, rreq_link) { trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous); + spin_lock(&rreq->lock); list_del(&subreq->rreq_link); + spin_unlock(&rreq->lock); netfs_put_subrequest(subreq, netfs_sreq_trace_put_done); if (subreq =3D=3D to) break; @@ -203,8 +205,10 @@ static void netfs_retry_read_subrequests(struct netfs_= io_request *rreq) refcount_read(&subreq->ref), netfs_sreq_trace_new); =20 + spin_lock(&rreq->lock); list_add(&subreq->rreq_link, &to->rreq_link); - to =3D list_next_entry(to, rreq_link); + spin_unlock(&rreq->lock); + to =3D subreq; trace_netfs_sreq(subreq, netfs_sreq_trace_retry); =20 stream->sreq_max_len =3D umin(len, rreq->rsize); diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c index 29489a23a220..32735abfa03f 100644 --- a/fs/netfs/write_retry.c +++ b/fs/netfs/write_retry.c @@ -130,7 +130,9 @@ static void netfs_retry_write_stream(struct netfs_io_re= quest *wreq, list_for_each_entry_safe_from(subreq, tmp, &stream->subrequests, rreq_link) { trace_netfs_sreq(subreq, netfs_sreq_trace_discard); + spin_lock(&wreq->lock); list_del(&subreq->rreq_link); + spin_unlock(&wreq->lock); netfs_put_subrequest(subreq, netfs_sreq_trace_put_done); if (subreq =3D=3D to) break; @@ -153,8 +155,10 @@ static void netfs_retry_write_stream(struct netfs_io_r= equest *wreq, netfs_sreq_trace_new); trace_netfs_sreq(subreq, netfs_sreq_trace_split); =20 + spin_lock(&wreq->lock); list_add(&subreq->rreq_link, &to->rreq_link); - to =3D list_next_entry(to, rreq_link); + spin_unlock(&wreq->lock); + to =3D subreq; trace_netfs_sreq(subreq, netfs_sreq_trace_retry); =20 stream->sreq_max_len =3D len; From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BC69306774 for ; Tue, 12 May 2026 12:34:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589282; cv=none; b=QfB0VO6qATD25JY41c9rVq0lsAXsn15Kc9cHuqnSVuSVCi5Y1iSRWo6PLyojZnRa7EGA6A66Wp8kiKIRwEUOj3r4OBoTtNzpnWck1Z+0erYejB24WjEIRkfW9yeGrjO8w2g0LIoFPUnLsPSZf5FgklodjOHoCNis6CWRygtpioQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589282; c=relaxed/simple; bh=3bqDWWNHQCp2k1HcURADyN1jWvBXjOQMZwOWDtu+cGM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WCJWcSMVJsNWyBfAck3+DF0BnZBx0ak3tBUYIMLEnZ1qn7QFAKqU8DMEVLLTi1ncx2MduXej9FBYp/X6CcoXsUzirmy97MCSuQyjK6EYRSjjE9hy9f1fs+7v9zKHekZaj3MvR3KbQYHZ/fmIzE4KTx53mhXzkk/I3P4wtKmOJh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KF/nbR61; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KF/nbR61" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589273; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E1xSzjtYNhafK6Xn+CKO9sIxYDmd2LbJIDZwNvhETDc=; b=KF/nbR61H5N1wAE/zYhuhCrXhiTPiRjZMl/5A3L13rb5pih8PGzXnPR60QsG1FloDW/VHL NoY1FhSfgVKnkFg/ytMsizOG5WQkqxKfD9tMAxKOKgMY1xAzPwTX1qoJJlUflIA8d/CHqv XotyIVl4G6OWJEefQ6x1g586LdtZk+Q= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-93-R2wWhPxJNOyZyqs1kVxQPw-1; Tue, 12 May 2026 08:34:28 -0400 X-MC-Unique: R2wWhPxJNOyZyqs1kVxQPw-1 X-Mimecast-MFC-AGG-ID: R2wWhPxJNOyZyqs1kVxQPw_1778589266 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EFC4119560A1; Tue, 12 May 2026 12:34:25 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C6CE219560A2; Tue, 12 May 2026 12:34:22 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 03/24] netfs: Fix missing barriers when accessing stream->subrequests locklessly Date: Tue, 12 May 2026 13:33:40 +0100 Message-ID: <20260512123404.719402-4-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" The list of subrequests attached to stream->subrequests is accessed without locks by netfs_collect_read_results() and netfs_collect_write_results(), and then they access subreq->flags without taking a barrier after getting the subreq pointer from the list. Relatedly, the functions that build the list don't use any sort of write barrier when constructing the list to make sure that the NETFS_SREQ_IN_PROGRESS flag is perceived to be set first if no lock is taken. Fix this by: (1) Add a new list_add_tail_release() function that uses a release barrier to set the pointer to the new member of the list. (2) Add a new list_first_entry_or_null_acquire() function that uses an acquire barrier to read the pointer to the first member in a list (or return NULL). (3) Use list_add_tail_release() when adding a subreq to ->subrequests. (4) Use list_first_entry_or_null_acquire() when initially accessing the front of the list (when an item is removed, the pointer to the new front iterm is obtained under the same lock). Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use o= ne work item") Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Link: https://sashiko.dev/#/patchset/20260326104544.509518-1-dhowells%40red= hat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 3 ++- fs/netfs/misc.c | 1 + fs/netfs/read_collect.c | 6 ++++-- fs/netfs/write_collect.c | 6 ++++-- fs/netfs/write_issue.c | 3 ++- include/linux/list.h | 37 +++++++++++++++++++++++++++++++++++++ 6 files changed, 50 insertions(+), 6 deletions(-) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index a27ed501b6d4..15d73026ff64 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -168,7 +168,8 @@ void netfs_queue_read(struct netfs_io_request *rreq, * remove entries off of the front. */ spin_lock(&rreq->lock); - list_add_tail(&subreq->rreq_link, &stream->subrequests); + /* Write IN_PROGRESS before pointer to new subreq */ + list_add_tail_release(&subreq->rreq_link, &stream->subrequests); if (list_is_first(&subreq->rreq_link, &stream->subrequests)) { if (!stream->active) { stream->collected_to =3D subreq->start; diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c index 6df89c92b10b..21357907b7ee 100644 --- a/fs/netfs/misc.c +++ b/fs/netfs/misc.c @@ -356,6 +356,7 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_= request *rreq, DEFINE_WAIT(myself); =20 list_for_each_entry(subreq, &stream->subrequests, rreq_link) { + smp_rmb(); /* Read ->next before IN_PROGRESS. */ if (!netfs_check_subreq_in_progress(subreq)) continue; =20 diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index d2d902f46627..3c9b847885c2 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -205,8 +205,10 @@ static void netfs_collect_read_results(struct netfs_io= _request *rreq) * in progress. The issuer thread may be adding stuff to the tail * whilst we're doing this. */ - front =3D list_first_entry_or_null(&stream->subrequests, - struct netfs_io_subrequest, rreq_link); + front =3D list_first_entry_or_null_acquire(&stream->subrequests, + struct netfs_io_subrequest, rreq_link); + /* Read first subreq pointer before IN_PROGRESS flag. */ + while (front) { size_t transferred; =20 diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c index b194447f4b11..7fbf50907a7f 100644 --- a/fs/netfs/write_collect.c +++ b/fs/netfs/write_collect.c @@ -228,8 +228,10 @@ static void netfs_collect_write_results(struct netfs_i= o_request *wreq) if (!smp_load_acquire(&stream->active)) continue; =20 - front =3D list_first_entry_or_null(&stream->subrequests, - struct netfs_io_subrequest, rreq_link); + front =3D list_first_entry_or_null_acquire(&stream->subrequests, + struct netfs_io_subrequest, rreq_link); + /* Read first subreq pointer before IN_PROGRESS flag. */ + while (front) { trace_netfs_collect_sreq(wreq, front); //_debug("sreq [%x] %llx %zx/%zx", diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c index 2db688f94125..b0e9690bb90c 100644 --- a/fs/netfs/write_issue.c +++ b/fs/netfs/write_issue.c @@ -204,7 +204,8 @@ void netfs_prepare_write(struct netfs_io_request *wreq, * remove entries off of the front. */ spin_lock(&wreq->lock); - list_add_tail(&subreq->rreq_link, &stream->subrequests); + /* Write IN_PROGRESS before pointer to new subreq */ + list_add_tail_release(&subreq->rreq_link, &stream->subrequests); if (list_is_first(&subreq->rreq_link, &stream->subrequests)) { if (!stream->active) { stream->collected_to =3D subreq->start; diff --git a/include/linux/list.h b/include/linux/list.h index 00ea8e5fb88b..09d979976b3b 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -191,6 +191,29 @@ static inline void list_add_tail(struct list_head *new= , struct list_head *head) __list_add(new, head->prev, head); } =20 +/** + * list_add_tail_release - add a new entry with release barrier + * @new: new entry to be added + * @head: list head to add it before + * + * Insert a new entry before the specified head, using a release barrier t= o set + * the ->next pointer that points to it. This is useful for implementing + * queues, in particular one that the elements will be walked through forw= ards + * locklessly. + */ +static inline void list_add_tail_release(struct list_head *new, + struct list_head *head) +{ + struct list_head *prev =3D head->prev; + + if (__list_add_valid(new, prev, head)) { + new->next =3D head; + new->prev =3D prev; + head->prev =3D new; + smp_store_release(&prev->next, new); + } +} + /* * Delete a list entry by making the prev/next entries * point to each other. @@ -644,6 +667,20 @@ static inline void list_splice_tail_init(struct list_h= ead *list, pos__ !=3D head__ ? list_entry(pos__, type, member) : NULL; \ }) =20 +/** + * list_first_entry_or_null_acquire - get the first element from a list wi= th barrier + * @ptr: the list head to take the element from. + * @type: the type of the struct this is embedded in. + * @member: the name of the list_head within the struct. + * + * Note that if the list is empty, it returns NULL. + */ +#define list_first_entry_or_null_acquire(ptr, type, member) ({ \ + struct list_head *head__ =3D (ptr); \ + struct list_head *pos__ =3D smp_load_acquire(&head__->next); \ + pos__ !=3D head__ ? list_entry(pos__, type, member) : NULL; \ +}) + /** * list_last_entry_or_null - get the last element from a list * @ptr: the list head to take the element from. From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C43D4BC00C for ; Tue, 12 May 2026 12:34:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589287; cv=none; b=nok7IllvDwTAqZ6l1cCSx3brBVirmbZNH/6MZ2k2+e81nTjvMKCmfgB13x+hy6I2sOGDZkQwRR05jq+22+svkRQgqcBUCE87G53MCLLacAVpkLUF5V1DGieqFNyAvVG3VMGa+phRDeX2BT5kZN6jkukrGIxT44tbIxoN3n7p4Rg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589287; c=relaxed/simple; bh=koNuJFj1Yw+BxszucAohlIs9ERwxVKnVYfx/Vf4HYr4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kQebyp5tVu9/SACnYVyvdWcpKZw0zVbGTGruf02qhPTCthW76RgSFns1VlW+WFOGtmQBSmB7R1yJqCbNcMQxvXeOx83F6J1waI0whQckAOD2KBYa55VJcJpLWZpAM4jvwU1BAvG6YYSh4PDWK19SzqdLATQcMbn+Nz9C/6PreDc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=JId3kIt1; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JId3kIt1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589275; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GCaVpMsucMqXUPa+HeMt32P8LUmPGDo+qVb1ukNhEOw=; b=JId3kIt1crL6nBGVmO9NycDYLxq3LcSs4IeIjWhertaXwqVLZJ5pFEJBi0CcpS1RLRq4yn LsXus3XEubMNVjUn9/nbIXzHNCTrm/5M0Mxk4ecFzMRc9IyKteDWK/DuzqjEZnhKIZlEVP Wzj/mTIqEGAnjY/SvIL0PcuzyruWx6c= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-676-iFr2GeCpPwaykEIAW5_3-A-1; Tue, 12 May 2026 08:34:31 -0400 X-MC-Unique: iFr2GeCpPwaykEIAW5_3-A-1 X-Mimecast-MFC-AGG-ID: iFr2GeCpPwaykEIAW5_3-A_1778589270 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6726619560A1; Tue, 12 May 2026 12:34:30 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A8F60180034E; Tue, 12 May 2026 12:34:27 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 04/24] netfs: Fix netfs_read_to_pagecache() to pause on subreq failure Date: Tue, 12 May 2026 13:33:41 +0100 Message-ID: <20260512123404.719402-5-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Fix netfs_read_to_pagecache() so that it pauses the generation of new subrequests if an already-issued subrequest fails. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index 15d73026ff64..fee0aebf5a3d 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -300,6 +300,11 @@ static void netfs_read_to_pagecache(struct netfs_io_re= quest *rreq, } =20 netfs_issue_read(rreq, subreq); + + if (test_bit(NETFS_RREQ_PAUSE, &rreq->flags)) + netfs_wait_for_paused_read(rreq); + if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) + break; cond_resched(); } while (size > 0); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6566137207B for ; Tue, 12 May 2026 12:34:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589300; cv=none; b=aIsVnZa0TvH7UgklLjMT3NSx2tIK7PMnxp/4LWi9Vzvl0ZQdRXhCovo4HrzyzGmMXi3fSDX43xvqOjwVmAYAir5jhsgs0pQ2LQCwHchNyYSQRLTR3KegIdm7QCAd765903w6HywfQcIqZ0sCzDxXK9ifXm05hX/+4W9HAggS8nc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589300; c=relaxed/simple; bh=qVQ7e0k8i2tr1dau0lAx+T0uYmHmt/EvTZLL7bN4nKM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RZ5st+VMxLFpQKD6enuxWxwrjwUx/7WgeXnGL2bISoz448uX0hDj2oIU0s0ZJytWKDEVTNeXwNuvBPNxjNQ6A+jPXZ5T+QL0xr9XVFxSst1nSbj7V0rn3OdiUYSn7uhTkyCEMfXuN/5AGbsMVMxPtfZJhSe2PrSFPNwdA3f3Sqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dqtpEl1z; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dqtpEl1z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N2YpXSQJAqYSQ8Jjznj+JeFvsCOnvgx69QZrd4vsSGk=; b=dqtpEl1z5gOJAx11yLTl4DnEQf5xWJqRkUcmItxTT8Ilri7Wz+UNFlIyvWGmjybASF8Z5u BhkSUpOvXScCC2lSCSJdY+cKMrAFT10ctZbt49o6LD0Wz0zv9PDEUQK/UbpFjq+3uv4IfY pFaeLk93iPHxi3VSSG4+OyPhKfRQ5Nw= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-509-cyl1xs7HN3ihH6xzc-ctEA-1; Tue, 12 May 2026 08:34:38 -0400 X-MC-Unique: cyl1xs7HN3ihH6xzc-ctEA-1 X-Mimecast-MFC-AGG-ID: cyl1xs7HN3ihH6xzc-ctEA_1778589276 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6E87A195608D; Tue, 12 May 2026 12:34:35 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 22CCE3002D31; Tue, 12 May 2026 12:34:31 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 05/24] netfs: Fix potential for tearing in ->remote_i_size and ->zero_point Date: Tue, 12 May 2026 13:33:42 +0100 Message-ID: <20260512123404.719402-6-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" Fix potential tearing in using ->remote_i_size and ->zero_point by copying i_size_read() and i_size_write() and using the same seqcount as for i_size. We need to make sure that netfslib and the filesystems that use it always hold i_lock whilst updating any of the sizes to prevent i_size_seqcount from getting corrupted. Fixes: 4058f742105e ("netfs: Keep track of the actual remote file size") Fixes: 100ccd18bb41 ("netfs: Optimise away reads above the point at which t= here can be no data") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/9p/v9fs_vfs.h | 13 -- fs/9p/vfs_inode.c | 6 +- fs/9p/vfs_inode_dotl.c | 12 +- fs/afs/file.c | 24 +++- fs/afs/inode.c | 31 ++-- fs/afs/internal.h | 11 +- fs/afs/write.c | 2 +- fs/netfs/buffered_read.c | 6 +- fs/netfs/buffered_write.c | 2 +- fs/netfs/direct_write.c | 6 +- fs/netfs/misc.c | 32 +++-- fs/netfs/write_collect.c | 9 +- fs/smb/client/cifsfs.c | 38 +++-- fs/smb/client/cifssmb.c | 3 +- fs/smb/client/file.c | 13 +- fs/smb/client/inode.c | 14 +- fs/smb/client/readdir.c | 3 +- fs/smb/client/smb2ops.c | 42 +++--- fs/smb/client/smb2pdu.c | 3 +- include/linux/netfs.h | 293 ++++++++++++++++++++++++++++++++++++-- 20 files changed, 450 insertions(+), 113 deletions(-) diff --git a/fs/9p/v9fs_vfs.h b/fs/9p/v9fs_vfs.h index d3aefbec4de6..34c115d7c250 100644 --- a/fs/9p/v9fs_vfs.h +++ b/fs/9p/v9fs_vfs.h @@ -75,17 +75,4 @@ static inline void v9fs_invalidate_inode_attr(struct ino= de *inode) =20 int v9fs_open_to_dotl_flags(int flags); =20 -static inline void v9fs_i_size_write(struct inode *inode, loff_t i_size) -{ - /* - * 32-bit need the lock, concurrent updates could break the - * sequences and make i_size_read() loop forever. - * 64-bit updates are atomic and can skip the locking. - */ - if (sizeof(i_size) > sizeof(long)) - spin_lock(&inode->i_lock); - i_size_write(inode, i_size); - if (sizeof(i_size) > sizeof(long)) - spin_unlock(&inode->i_lock); -} #endif diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index d1508b1fe109..f468acb8ee7d 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -1141,11 +1141,13 @@ v9fs_stat2inode(struct p9_wstat *stat, struct inode= *inode, mode |=3D inode->i_mode & ~S_IALLUGO; inode->i_mode =3D mode; =20 - v9inode->netfs.remote_i_size =3D stat->length; + spin_lock(&inode->i_lock); + netfs_write_remote_i_size(inode, stat->length); if (!(flags & V9FS_STAT2INODE_KEEP_ISIZE)) - v9fs_i_size_write(inode, stat->length); + i_size_write(inode, stat->length); /* not real number of blocks, but 512 byte ones ... */ inode->i_blocks =3D (stat->length + 512 - 1) >> 9; + spin_unlock(&inode->i_lock); v9inode->cache_validity &=3D ~V9FS_INO_INVALID_ATTR; } =20 diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c index 71796a89bcf4..141fb54db65d 100644 --- a/fs/9p/vfs_inode_dotl.c +++ b/fs/9p/vfs_inode_dotl.c @@ -634,10 +634,12 @@ v9fs_stat2inode_dotl(struct p9_stat_dotl *stat, struc= t inode *inode, mode |=3D inode->i_mode & ~S_IALLUGO; inode->i_mode =3D mode; =20 - v9inode->netfs.remote_i_size =3D stat->st_size; + spin_lock(&inode->i_lock); + netfs_write_remote_i_size(inode, stat->st_size); if (!(flags & V9FS_STAT2INODE_KEEP_ISIZE)) - v9fs_i_size_write(inode, stat->st_size); + i_size_write(inode, stat->st_size); inode->i_blocks =3D stat->st_blocks; + spin_unlock(&inode->i_lock); } else { if (stat->st_result_mask & P9_STATS_ATIME) { inode_set_atime(inode, stat->st_atime_sec, @@ -662,13 +664,15 @@ v9fs_stat2inode_dotl(struct p9_stat_dotl *stat, struc= t inode *inode, mode |=3D inode->i_mode & ~S_IALLUGO; inode->i_mode =3D mode; } + spin_lock(&inode->i_lock); if (!(flags & V9FS_STAT2INODE_KEEP_ISIZE) && stat->st_result_mask & P9_STATS_SIZE) { - v9inode->netfs.remote_i_size =3D stat->st_size; - v9fs_i_size_write(inode, stat->st_size); + netfs_write_remote_i_size(inode, stat->st_size); + i_size_write(inode, stat->st_size); } if (stat->st_result_mask & P9_STATS_BLOCKS) inode->i_blocks =3D stat->st_blocks; + spin_unlock(&inode->i_lock); } if (stat->st_result_mask & P9_STATS_GEN) inode->i_generation =3D stat->st_gen; diff --git a/fs/afs/file.c b/fs/afs/file.c index 85696ac984cc..0467742bfeee 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -427,21 +427,35 @@ static void afs_free_request(struct netfs_io_request = *rreq) afs_put_wb_key(rreq->netfs_priv2); } =20 -static void afs_update_i_size(struct inode *inode, loff_t new_i_size) +/* + * Set the file size and block count, taking ->cb_lock and ->i_lock to mai= ntain + * coherency and prevent 64-bit tearing on 32-bit arches. + * + * Also, estimate the number of 512 bytes blocks used, rounded up to neare= st 1K + * for consistency with other AFS clients. + */ +void afs_set_i_size(struct afs_vnode *vnode, loff_t new_i_size) { - struct afs_vnode *vnode =3D AFS_FS_I(inode); + struct inode *inode =3D &vnode->netfs.inode; loff_t i_size; =20 write_seqlock(&vnode->cb_lock); - i_size =3D i_size_read(&vnode->netfs.inode); + spin_lock(&inode->i_lock); + i_size =3D i_size_read(inode); if (new_i_size > i_size) { - i_size_write(&vnode->netfs.inode, new_i_size); - inode_set_bytes(&vnode->netfs.inode, new_i_size); + i_size_write(inode, new_i_size); + inode_set_bytes(inode, round_up(new_i_size, 1024)); } + spin_unlock(&inode->i_lock); write_sequnlock(&vnode->cb_lock); fscache_update_cookie(afs_vnode_cache(vnode), NULL, &new_i_size); } =20 +static void afs_update_i_size(struct inode *inode, loff_t new_i_size) +{ + afs_set_i_size(AFS_FS_I(inode), new_i_size); +} + static void afs_netfs_invalidate_cache(struct netfs_io_request *wreq) { struct afs_vnode *vnode =3D AFS_FS_I(wreq->inode); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index a5173434f786..19fe2e392885 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -224,7 +224,8 @@ static int afs_inode_init_from_status(struct afs_operat= ion *op, return afs_protocol_error(NULL, afs_eproto_file_type); } =20 - afs_set_i_size(vnode, status->size); + i_size_write(inode, status->size); + inode_set_bytes(inode, status->size); afs_set_netfs_context(vnode); =20 vnode->invalid_before =3D status->data_version; @@ -253,7 +254,8 @@ static void afs_apply_status(struct afs_operation *op, { struct afs_file_status *status =3D &vp->scb.status; struct afs_vnode *vnode =3D vp->vnode; - struct inode *inode =3D &vnode->netfs.inode; + struct netfs_inode *ictx =3D &vnode->netfs; + struct inode *inode =3D &ictx->inode; struct timespec64 t; umode_t mode; bool unexpected_jump =3D false; @@ -336,6 +338,8 @@ static void afs_apply_status(struct afs_operation *op, } =20 if (data_changed) { + unsigned long long zero_point, size =3D status->size; + inode_set_iversion_raw(inode, status->data_version); =20 /* Only update the size if the data version jumped. If the @@ -343,16 +347,25 @@ static void afs_apply_status(struct afs_operation *op, * idea of what the size should be that's not the same as * what's on the server. */ - vnode->netfs.remote_i_size =3D status->size; - if (change_size || status->size > i_size_read(inode)) { - afs_set_i_size(vnode, status->size); + spin_lock(&inode->i_lock); + + if (change_size || size > i_size_read(inode)) { + /* We can read the sizes directly as we hold i_lock. */ + zero_point =3D ictx->_zero_point; + if (unexpected_jump) - vnode->netfs.zero_point =3D status->size; + zero_point =3D size; + netfs_write_sizes(inode, size, size, zero_point); + inode_set_bytes(inode, size); inode_set_ctime_to_ts(inode, t); inode_set_atime_to_ts(inode, t); + } else { + netfs_write_remote_i_size(inode, size); } + spin_unlock(&inode->i_lock); + if (op->ops =3D=3D &afs_fetch_data_operation) - op->fetch.subreq->rreq->i_size =3D status->size; + op->fetch.subreq->rreq->i_size =3D size; } } =20 @@ -709,7 +722,7 @@ int afs_getattr(struct mnt_idmap *idmap, const struct p= ath *path, * it, but we need to give userspace the server's size. */ if (S_ISDIR(inode->i_mode)) - stat->size =3D vnode->netfs.remote_i_size; + stat->size =3D netfs_read_remote_i_size(inode); } while (read_seqretry(&vnode->cb_lock, seq)); =20 return 0; @@ -889,7 +902,7 @@ int afs_setattr(struct mnt_idmap *idmap, struct dentry = *dentry, */ if (!(attr->ia_valid & (supported & ~ATTR_SIZE & ~ATTR_MTIME)) && attr->ia_size < i_size && - attr->ia_size > vnode->netfs.remote_i_size) { + attr->ia_size > netfs_read_remote_i_size(inode)) { truncate_setsize(inode, attr->ia_size); netfs_resize_file(&vnode->netfs, size, false); fscache_resize_cookie(afs_vnode_cache(vnode), diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 599353c33337..816dc848ea71 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1157,6 +1157,7 @@ extern int afs_open(struct inode *, struct file *); extern int afs_release(struct inode *, struct file *); void afs_fetch_data_async_rx(struct work_struct *work); void afs_fetch_data_immediate_cancel(struct afs_call *call); +void afs_set_i_size(struct afs_vnode *vnode, loff_t new_i_size); =20 /* * flock.c @@ -1758,16 +1759,6 @@ static inline void afs_update_dentry_version(struct = afs_operation *op, (void *)(unsigned long)dir_vp->scb.status.data_version; } =20 -/* - * Set the file size and block count. Estimate the number of 512 bytes bl= ocks - * used, rounded up to nearest 1K for consistency with other AFS clients. - */ -static inline void afs_set_i_size(struct afs_vnode *vnode, u64 size) -{ - i_size_write(&vnode->netfs.inode, size); - vnode->netfs.inode.i_blocks =3D ((size + 1023) >> 10) << 1; -} - /* * Check for a conflicting operation on a directory that we just unlinked = from. * If someone managed to sneak a link or an unlink in on the file we just diff --git a/fs/afs/write.c b/fs/afs/write.c index fcfed9d24e0a..7f34b939706a 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -142,7 +142,7 @@ static void afs_issue_write_worker(struct work_struct *= work) afs_begin_vnode_operation(op); =20 op->store.write_iter =3D &subreq->io_iter; - op->store.i_size =3D umax(pos + len, vnode->netfs.remote_i_size); + op->store.i_size =3D umax(pos + len, netfs_read_remote_i_size(&vnode->net= fs.inode)); op->mtime =3D inode_get_mtime(&vnode->netfs.inode); =20 afs_wait_for_operation(op); diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index fee0aebf5a3d..ebd84a6cc3f0 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -209,7 +209,6 @@ static void netfs_issue_read(struct netfs_io_request *r= req, static void netfs_read_to_pagecache(struct netfs_io_request *rreq, struct readahead_control *ractl) { - struct netfs_inode *ictx =3D netfs_inode(rreq->inode); unsigned long long start =3D rreq->start; ssize_t size =3D rreq->len; int ret =3D 0; @@ -233,7 +232,8 @@ static void netfs_read_to_pagecache(struct netfs_io_req= uest *rreq, source =3D netfs_cache_prepare_read(rreq, subreq, rreq->i_size); subreq->source =3D source; if (source =3D=3D NETFS_DOWNLOAD_FROM_SERVER) { - unsigned long long zp =3D umin(ictx->zero_point, rreq->i_size); + unsigned long long zero_point =3D netfs_read_zero_point(rreq->inode); + unsigned long long zp =3D umin(zero_point, rreq->i_size); size_t len =3D subreq->len; =20 if (unlikely(rreq->origin =3D=3D NETFS_READ_SINGLE)) @@ -249,7 +249,7 @@ static void netfs_read_to_pagecache(struct netfs_io_req= uest *rreq, pr_err("ZERO-LEN READ: R=3D%08x[%x] l=3D%zx/%zx s=3D%llx z=3D%llx i=3D= %llx", rreq->debug_id, subreq->debug_index, subreq->len, size, - subreq->start, ictx->zero_point, rreq->i_size); + subreq->start, zero_point, rreq->i_size); netfs_cancel_read(subreq, ret); break; } diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index 05ea5b0cc0e8..b6ecd059dc4f 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -230,7 +230,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, * server would just return a block of zeros or a short read if * we try to read it. */ - if (fpos >=3D ctx->zero_point) { + if (fpos >=3D netfs_read_zero_point(inode)) { folio_zero_segment(folio, 0, offset); copied =3D copy_folio_from_iter_atomic(folio, offset, part, iter); if (unlikely(copied =3D=3D 0)) diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c index f9ab69de3e29..25f8ceb15fad 100644 --- a/fs/netfs/direct_write.c +++ b/fs/netfs/direct_write.c @@ -376,8 +376,10 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb= , struct iov_iter *from) if (ret < 0) goto out; end =3D iocb->ki_pos + iov_iter_count(from); - if (end > ictx->zero_point) - ictx->zero_point =3D end; + spin_lock(&inode->i_lock); + if (end > ictx->_zero_point) + netfs_write_zero_point(inode, end); + spin_unlock(&inode->i_lock); =20 fscache_invalidate(netfs_i_cookie(ictx), NULL, i_size_read(inode), FSCACHE_INVAL_DIO_WRITE); diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c index 21357907b7ee..bad661ff2bec 100644 --- a/fs/netfs/misc.c +++ b/fs/netfs/misc.c @@ -211,18 +211,25 @@ EXPORT_SYMBOL(netfs_clear_inode_writeback); void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t len= gth) { struct netfs_folio *finfo; - struct netfs_inode *ctx =3D netfs_inode(folio_inode(folio)); + struct inode *inode =3D folio_inode(folio); + struct netfs_inode *ctx =3D netfs_inode(inode); size_t flen =3D folio_size(folio); =20 _enter("{%lx},%zx,%zx", folio->index, offset, length); =20 if (offset =3D=3D 0 && length =3D=3D flen) { - unsigned long long i_size =3D i_size_read(&ctx->inode); + unsigned long long i_size, remote_i_size, zero_point; unsigned long long fpos =3D folio_pos(folio), end; =20 + netfs_read_sizes(inode, &i_size, &remote_i_size, &zero_point); end =3D umin(fpos + flen, i_size); - if (fpos < i_size && end > ctx->zero_point) - ctx->zero_point =3D end; + if (fpos < i_size && end > zero_point) { + spin_lock(&inode->i_lock); + end =3D umin(fpos + flen, inode->i_size); + if (fpos < i_size && end > ctx->_zero_point) + netfs_write_zero_point(inode, end); + spin_unlock(&inode->i_lock); + } } =20 folio_wait_private_2(folio); /* [DEPRECATED] */ @@ -292,15 +299,22 @@ EXPORT_SYMBOL(netfs_invalidate_folio); */ bool netfs_release_folio(struct folio *folio, gfp_t gfp) { - struct netfs_inode *ctx =3D netfs_inode(folio_inode(folio)); - unsigned long long end; + struct inode *inode =3D folio_inode(folio); + struct netfs_inode *ctx =3D netfs_inode(inode); + unsigned long long i_size, remote_i_size, zero_point, end; =20 if (folio_test_dirty(folio)) return false; =20 - end =3D umin(folio_next_pos(folio), i_size_read(&ctx->inode)); - if (end > ctx->zero_point) - ctx->zero_point =3D end; + netfs_read_sizes(inode, &i_size, &remote_i_size, &zero_point); + end =3D umin(folio_next_pos(folio), i_size); + if (end > zero_point) { + spin_lock(&inode->i_lock); + end =3D umin(folio_next_pos(folio), inode->i_size); + if (end > ctx->_zero_point) + netfs_write_zero_point(inode, end); + spin_unlock(&inode->i_lock); + } =20 if (folio_test_private(folio)) return false; diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c index 7fbf50907a7f..24fc2bb2f8a4 100644 --- a/fs/netfs/write_collect.c +++ b/fs/netfs/write_collect.c @@ -57,7 +57,8 @@ static void netfs_dump_request(const struct netfs_io_requ= est *rreq) int netfs_folio_written_back(struct folio *folio) { enum netfs_folio_trace why =3D netfs_folio_trace_clear; - struct netfs_inode *ictx =3D netfs_inode(folio->mapping->host); + struct inode *inode =3D folio_inode(folio); + struct netfs_inode *ictx =3D netfs_inode(inode); struct netfs_folio *finfo; struct netfs_group *group =3D NULL; int gcount =3D 0; @@ -69,8 +70,10 @@ int netfs_folio_written_back(struct folio *folio) unsigned long long fend; =20 fend =3D folio_pos(folio) + finfo->dirty_offset + finfo->dirty_len; - if (fend > ictx->zero_point) - ictx->zero_point =3D fend; + spin_lock(&ictx->inode.i_lock); + if (fend > ictx->_zero_point) + netfs_write_zero_point(inode, fend); + spin_unlock(&ictx->inode.i_lock); =20 folio_detach_private(folio); group =3D finfo->netfs_group; diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c index 9f76b0347fa9..feac491c5070 100644 --- a/fs/smb/client/cifsfs.c +++ b/fs/smb/client/cifsfs.c @@ -434,7 +434,8 @@ cifs_alloc_inode(struct super_block *sb) spin_lock_init(&cifs_inode->writers_lock); cifs_inode->writers =3D 0; cifs_inode->netfs.inode.i_blkbits =3D 14; /* 2**14 =3D CIFS_MAX_MSGSIZE = */ - cifs_inode->netfs.remote_i_size =3D 0; + cifs_inode->netfs._remote_i_size =3D 0; + cifs_inode->netfs._zero_point =3D 0; cifs_inode->uniqueid =3D 0; cifs_inode->createtime =3D 0; cifs_inode->epoch =3D 0; @@ -1303,7 +1304,8 @@ static loff_t cifs_remap_file_range(struct file *src_= file, loff_t off, struct cifsFileInfo *smb_file_src =3D src_file->private_data; struct cifsFileInfo *smb_file_target =3D dst_file->private_data; struct cifs_tcon *target_tcon, *src_tcon; - unsigned long long destend, fstart, fend, old_size, new_size; + unsigned long long i_size, old_size, new_size, zero_point; + unsigned long long destend, fstart, fend; unsigned int xid; int rc; =20 @@ -1347,7 +1349,7 @@ static loff_t cifs_remap_file_range(struct file *src_= file, loff_t off, * Advance the EOF marker after the flush above to the end of the range * if it's short of that. */ - if (src_cifsi->netfs.remote_i_size < off + len) { + if (netfs_read_remote_i_size(src_inode) < off + len) { rc =3D cifs_precopy_set_eof(src_inode, src_cifsi, src_tcon, xid, off + l= en); if (rc < 0) goto unlock; @@ -1368,16 +1370,18 @@ static loff_t cifs_remap_file_range(struct file *sr= c_file, loff_t off, rc =3D cifs_flush_folio(target_inode, destend, &fstart, &fend, false); if (rc) goto unlock; - if (fend > target_cifsi->netfs.zero_point) - target_cifsi->netfs.zero_point =3D fend + 1; - old_size =3D target_cifsi->netfs.remote_i_size; + + spin_lock(&target_inode->i_lock); + if (fend > zero_point) + netfs_write_zero_point(target_inode, fend + 1); + i_size =3D target_inode->i_size; + spin_unlock(&target_inode->i_lock); =20 /* Discard all the folios that overlap the destination region. */ cifs_dbg(FYI, "about to discard pages %llx-%llx\n", fstart, fend); truncate_inode_pages_range(&target_inode->i_data, fstart, fend); =20 - fscache_invalidate(cifs_inode_cookie(target_inode), NULL, - i_size_read(target_inode), 0); + fscache_invalidate(cifs_inode_cookie(target_inode), NULL, i_size, 0); =20 rc =3D -EOPNOTSUPP; if (target_tcon->ses->server->ops->duplicate_extents) { @@ -1402,8 +1406,12 @@ static loff_t cifs_remap_file_range(struct file *src= _file, loff_t off, rc =3D -EINVAL; } } - if (rc =3D=3D 0 && new_size > target_cifsi->netfs.zero_point) - target_cifsi->netfs.zero_point =3D new_size; + if (rc =3D=3D 0) { + spin_lock(&target_inode->i_lock); + if (new_size > target_cifsi->netfs._zero_point) + netfs_write_zero_point(target_inode, new_size); + spin_unlock(&target_inode->i_lock); + } } =20 /* force revalidate of size and timestamps of target file now @@ -1474,7 +1482,7 @@ ssize_t cifs_file_copychunk_range(unsigned int xid, * Advance the EOF marker after the flush above to the end of the range * if it's short of that. */ - if (src_cifsi->netfs.remote_i_size < off + len) { + if (netfs_read_remote_i_size(src_inode) < off + len) { rc =3D cifs_precopy_set_eof(src_inode, src_cifsi, src_tcon, xid, off + l= en); if (rc < 0) goto unlock; @@ -1502,8 +1510,12 @@ ssize_t cifs_file_copychunk_range(unsigned int xid, fscache_resize_cookie(cifs_inode_cookie(target_inode), i_size_read(target_inode)); } - if (rc > 0 && destoff + rc > target_cifsi->netfs.zero_point) - target_cifsi->netfs.zero_point =3D destoff + rc; + if (rc > 0) { + spin_lock(&target_inode->i_lock); + if (destoff + rc > target_cifsi->netfs._zero_point) + netfs_write_zero_point(target_inode, destoff + rc); + spin_unlock(&target_inode->i_lock); + } } =20 file_accessed(src_file); diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c index 3990a9012264..9e27bfa7376b 100644 --- a/fs/smb/client/cifssmb.c +++ b/fs/smb/client/cifssmb.c @@ -1465,6 +1465,7 @@ cifs_readv_callback(struct TCP_Server_Info *server, s= truct mid_q_entry *mid) struct cifs_io_subrequest *rdata =3D mid->callback_data; struct netfs_inode *ictx =3D netfs_inode(rdata->rreq->inode); struct cifs_tcon *tcon =3D tlink_tcon(rdata->req->cfile->tlink); + struct inode *inode =3D &ictx->inode; struct smb_rqst rqst =3D { .rq_iov =3D rdata->iov, .rq_nvec =3D 1, .rq_iter =3D rdata->subreq.io_iter }; @@ -1538,7 +1539,7 @@ cifs_readv_callback(struct TCP_Server_Info *server, s= truct mid_q_entry *mid) } else { size_t trans =3D rdata->subreq.transferred + rdata->got_bytes; if (trans < rdata->subreq.len && - rdata->subreq.start + trans >=3D ictx->remote_i_size) { + rdata->subreq.start + trans >=3D netfs_read_remote_i_size(inode)) { rdata->result =3D 0; __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); } else if (rdata->got_bytes > 0) { diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index 664a2c223089..b60344125f27 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -2517,18 +2517,23 @@ int cifs_lock(struct file *file, int cmd, struct fi= le_lock *flock) void cifs_write_subrequest_terminated(struct cifs_io_subrequest *wdata, ss= ize_t result) { struct netfs_io_request *wreq =3D wdata->rreq; - struct netfs_inode *ictx =3D netfs_inode(wreq->inode); + struct inode *inode =3D wreq->inode; + struct netfs_inode *ictx =3D netfs_inode(inode); loff_t wrend; =20 if (result > 0) { + spin_lock(&inode->i_lock); + wrend =3D wdata->subreq.start + wdata->subreq.transferred + result; =20 - if (wrend > ictx->zero_point && + if (wrend > ictx->_zero_point && (wdata->rreq->origin =3D=3D NETFS_UNBUFFERED_WRITE || wdata->rreq->origin =3D=3D NETFS_DIO_WRITE)) - ictx->zero_point =3D wrend; - if (wrend > ictx->remote_i_size) + netfs_write_zero_point(inode, wrend); + if (wrend > ictx->_remote_i_size) netfs_resize_file(ictx, wrend, true); + + spin_unlock(&inode->i_lock); } =20 netfs_write_subrequest_terminated(&wdata->subreq, result); diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index 16a5310155d5..9472c0a6c187 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -119,7 +119,7 @@ cifs_revalidate_cache(struct inode *inode, struct cifs_= fattr *fattr) fattr->cf_mtime =3D timestamp_truncate(fattr->cf_mtime, inode); mtime =3D inode_get_mtime(inode); if (timespec64_equal(&mtime, &fattr->cf_mtime) && - cifs_i->netfs.remote_i_size =3D=3D fattr->cf_eof) { + netfs_read_remote_i_size(inode) =3D=3D fattr->cf_eof) { cifs_dbg(FYI, "%s: inode %llu is unchanged\n", __func__, cifs_i->uniqueid); return; @@ -173,12 +173,12 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_= fattr *fattr, CIFS_I(inode)->time =3D 0; /* force reval */ return -ESTALE; } - if (inode_state_read_once(inode) & I_NEW) - CIFS_I(inode)->netfs.zero_point =3D fattr->cf_eof; - cifs_revalidate_cache(inode, fattr); =20 spin_lock(&inode->i_lock); + if (inode_state_read_once(inode) & I_NEW) + netfs_write_zero_point(inode, fattr->cf_eof); + fattr->cf_mtime =3D timestamp_truncate(fattr->cf_mtime, inode); fattr->cf_atime =3D timestamp_truncate(fattr->cf_atime, inode); fattr->cf_ctime =3D timestamp_truncate(fattr->cf_ctime, inode); @@ -212,7 +212,7 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fa= ttr *fattr, else clear_bit(CIFS_INO_DELETE_PENDING, &cifs_i->flags); =20 - cifs_i->netfs.remote_i_size =3D fattr->cf_eof; + netfs_write_remote_i_size(inode, fattr->cf_eof); /* * Can't safely change the file size here if the client is writing to * it due to potential races. @@ -2772,7 +2772,9 @@ cifs_revalidate_mapping(struct inode *inode) if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_RW_CACHE) goto skip_invalidate; =20 - cifs_inode->netfs.zero_point =3D cifs_inode->netfs.remote_i_size; + spin_lock(&inode->i_lock); + netfs_write_zero_point(inode, netfs_inode(inode)->_remote_i_size); + spin_unlock(&inode->i_lock); rc =3D filemap_invalidate_inode(inode, true, 0, LLONG_MAX); if (rc) { cifs_dbg(VFS, "%s: invalidate inode %p failed with rc %d\n", diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c index be22bbc4a65a..e860fa08b5e3 100644 --- a/fs/smb/client/readdir.c +++ b/fs/smb/client/readdir.c @@ -143,7 +143,8 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *n= ame, fattr->cf_rdev =3D inode->i_rdev; fattr->cf_uid =3D inode->i_uid; fattr->cf_gid =3D inode->i_gid; - fattr->cf_eof =3D CIFS_I(inode)->netfs.remote_i_size; + fattr->cf_eof =3D + netfs_read_remote_i_size(inode); fattr->cf_symlink_target =3D NULL; } else { CIFS_I(inode)->time =3D 0; diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index e6cb9b144530..0ea3ce1b94ea 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -3402,8 +3402,7 @@ static long smb3_zero_range(struct file *file, struct= cifs_tcon *tcon, struct inode *inode =3D file_inode(file); struct cifsInodeInfo *cifsi =3D CIFS_I(inode); struct cifsFileInfo *cfile =3D file->private_data; - struct netfs_inode *ictx =3D netfs_inode(inode); - unsigned long long i_size, new_size, remote_size; + unsigned long long i_size, new_size, remote_i_size, zero_point; long rc; unsigned int xid; =20 @@ -3414,9 +3413,8 @@ static long smb3_zero_range(struct file *file, struct= cifs_tcon *tcon, =20 filemap_invalidate_lock(inode->i_mapping); =20 - i_size =3D i_size_read(inode); - remote_size =3D ictx->remote_i_size; - if (offset + len >=3D remote_size && offset < i_size) { + netfs_read_sizes(inode, &i_size, &remote_i_size, &zero_point); + if (offset + len >=3D remote_i_size && offset < i_size) { unsigned long long top =3D umin(offset + len, i_size); =20 rc =3D filemap_write_and_wait_range(inode->i_mapping, offset, top - 1); @@ -3449,9 +3447,11 @@ static long smb3_zero_range(struct file *file, struc= t cifs_tcon *tcon, cfile->fid.volatile_fid, cfile->pid, new_size); if (rc >=3D 0) { truncate_setsize(inode, new_size); + spin_lock(&inode->i_lock); netfs_resize_file(&cifsi->netfs, new_size, true); - if (offset < cifsi->netfs.zero_point) - cifsi->netfs.zero_point =3D offset; + if (offset < cifsi->netfs._zero_point) + netfs_write_zero_point(inode, offset); + spin_unlock(&inode->i_lock); fscache_resize_cookie(cifs_inode_cookie(inode), new_size); } } @@ -3474,7 +3474,7 @@ static long smb3_punch_hole(struct file *file, struct= cifs_tcon *tcon, struct inode *inode =3D file_inode(file); struct cifsFileInfo *cfile =3D file->private_data; struct file_zero_data_information fsctl_buf; - unsigned long long end =3D offset + len, i_size, remote_i_size; + unsigned long long end =3D offset + len, i_size, remote_i_size, zero_poin= t; long rc; unsigned int xid; __u8 set_sparse =3D 1; @@ -3516,14 +3516,17 @@ static long smb3_punch_hole(struct file *file, stru= ct cifs_tcon *tcon, * that we locally hole-punch the tail of the dirty data, the proposed * EOF update will end up in the wrong place. */ - i_size =3D i_size_read(inode); - remote_i_size =3D netfs_inode(inode)->remote_i_size; + netfs_read_sizes(inode, &i_size, &remote_i_size, &zero_point); + if (end > remote_i_size && i_size > remote_i_size) { unsigned long long extend_to =3D umin(end, i_size); rc =3D SMB2_set_eof(xid, tcon, cfile->fid.persistent_fid, cfile->fid.volatile_fid, cfile->pid, extend_to); - if (rc >=3D 0) - netfs_inode(inode)->remote_i_size =3D extend_to; + if (rc >=3D 0) { + spin_lock(&inode->i_lock); + netfs_write_remote_i_size(inode, extend_to); + spin_unlock(&inode->i_lock); + } } =20 unlock: @@ -3787,7 +3790,6 @@ static long smb3_collapse_range(struct file *file, st= ruct cifs_tcon *tcon, struct inode *inode =3D file_inode(file); struct cifsInodeInfo *cifsi =3D CIFS_I(inode); struct cifsFileInfo *cfile =3D file->private_data; - struct netfs_inode *ictx =3D &cifsi->netfs; loff_t old_eof, new_eof; =20 xid =3D get_xid(); @@ -3805,7 +3807,9 @@ static long smb3_collapse_range(struct file *file, st= ruct cifs_tcon *tcon, goto out_2; =20 truncate_pagecache_range(inode, off, old_eof); - ictx->zero_point =3D old_eof; + spin_lock(&inode->i_lock); + netfs_write_zero_point(inode, old_eof); + spin_unlock(&inode->i_lock); netfs_wait_for_outstanding_io(inode); =20 rc =3D smb2_copychunk_range(xid, cfile, cfile, off + len, @@ -3822,8 +3826,10 @@ static long smb3_collapse_range(struct file *file, s= truct cifs_tcon *tcon, rc =3D 0; =20 truncate_setsize(inode, new_eof); + spin_lock(&inode->i_lock); netfs_resize_file(&cifsi->netfs, new_eof, true); - ictx->zero_point =3D new_eof; + netfs_write_zero_point(inode, new_eof); + spin_unlock(&inode->i_lock); fscache_resize_cookie(cifs_inode_cookie(inode), new_eof); out_2: filemap_invalidate_unlock(inode->i_mapping); @@ -3866,13 +3872,17 @@ static long smb3_insert_range(struct file *file, st= ruct cifs_tcon *tcon, goto out_2; =20 truncate_setsize(inode, new_eof); + spin_lock(&inode->i_lock); netfs_resize_file(&cifsi->netfs, i_size_read(inode), true); + spin_unlock(&inode->i_lock); fscache_resize_cookie(cifs_inode_cookie(inode), i_size_read(inode)); =20 rc =3D smb2_copychunk_range(xid, cfile, cfile, off, count, off + len); if (rc < 0) goto out_2; - cifsi->netfs.zero_point =3D new_eof; + spin_lock(&inode->i_lock); + netfs_write_zero_point(inode, new_eof); + spin_unlock(&inode->i_lock); =20 rc =3D smb3_zero_data(file, tcon, off, len, xid); if (rc < 0) diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 995fcdd30681..3bd300347f16 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -4608,6 +4608,7 @@ smb2_readv_callback(struct TCP_Server_Info *server, s= truct mid_q_entry *mid) struct netfs_inode *ictx =3D netfs_inode(rdata->rreq->inode); struct cifs_tcon *tcon =3D tlink_tcon(rdata->req->cfile->tlink); struct smb2_hdr *shdr =3D (struct smb2_hdr *)rdata->iov[0].iov_base; + struct inode *inode =3D &ictx->inode; struct cifs_credits credits =3D { .value =3D 0, .instance =3D 0, @@ -4721,7 +4722,7 @@ smb2_readv_callback(struct TCP_Server_Info *server, s= truct mid_q_entry *mid) } else { size_t trans =3D rdata->subreq.transferred + rdata->got_bytes; if (trans < rdata->subreq.len && - rdata->subreq.start + trans >=3D ictx->remote_i_size) { + rdata->subreq.start + trans >=3D netfs_read_remote_i_size(inode)) { __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); rdata->result =3D 0; } diff --git a/include/linux/netfs.h b/include/linux/netfs.h index ba17ac5bf356..4fd1d796ad73 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -62,8 +62,8 @@ struct netfs_inode { struct fscache_cookie *cache; #endif struct mutex wb_lock; /* Writeback serialisation */ - loff_t remote_i_size; /* Size of the remote file */ - loff_t zero_point; /* Size after which we assume there's no data + loff_t _remote_i_size; /* Size of the remote file */ + loff_t _zero_point; /* Size after which we assume there's no data * on the server */ atomic_t io_count; /* Number of outstanding reqs */ unsigned long flags; @@ -474,6 +474,254 @@ static inline struct netfs_inode *netfs_inode(struct = inode *inode) return container_of(inode, struct netfs_inode, inode); } =20 +/** + * netfs_read_remote_i_size - Read remote_i_size safely + * @inode: The inode to access + * + * Read remote_i_size safely without the potential for tearing on 32-bit + * arches. + * + * NOTE: in a 32bit arch with a preemptable kernel and an UP compile the + * i_size_read/write must be atomic with respect to the local cpu (unlike = with + * preempt disabled), but they don't need to be atomic with respect to oth= er + * cpus like in true SMP (so they need either to either locally disable irq + * around the read or for example on x86 they can be still implemented as a + * cmpxchg8b without the need of the lock prefix). For SMP compiles and 6= 4bit + * archs it makes no difference if preempt is enabled or not. + */ +static inline unsigned long long netfs_read_remote_i_size(const struct ino= de *inode) +{ + const struct netfs_inode *ictx =3D container_of(inode, struct netfs_inode= , inode); + unsigned long long remote_i_size; + +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + unsigned int seq; + + do { + seq =3D read_seqcount_begin(&inode->i_size_seqcount); + remote_i_size =3D ictx->_remote_i_size; + } while (read_seqcount_retry(&inode->i_size_seqcount, seq)); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); + remote_i_size =3D ictx->_remote_i_size; + preempt_enable(); +#else + /* Pairs with smp_store_release() in netfs_write_remote_i_size() */ + remote_i_size =3D smp_load_acquire(&ictx->_remote_i_size); +#endif + return remote_i_size; +} + +/* + * netfs_write_remote_i_size - Set remote_i_size safely + * @inode: The inode to access + * @remote_i_size: The new value for the size of the file on the server + * + * Set remote_i_size safely without the potential for tearing on 32-bit ar= ches. + * + * Context: The caller must hold inode->i_lock. + * + * NOTE: unlike netfs_read_remote_i_size(), netfs_write_remote_i_size() do= es + * need locking around it (normally i_rwsem), otherwise on 32bit/SMP an up= date + * of i_size_seqcount can be lost, resulting in subsequent i_size_read() c= alls + * spinning forever. + */ +static inline void netfs_write_remote_i_size(struct inode *inode, + unsigned long long remote_i_size) +{ + struct netfs_inode *ictx =3D netfs_inode(inode); + +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + write_seqcount_begin(&inode->i_size_seqcount); + ictx->_remote_i_size =3D remote_i_size; + write_seqcount_end(&inode->i_size_seqcount); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); + ictx->_remote_i_size =3D remote_i_size; + preempt_enable(); +#else + /* + * Pairs with smp_load_acquire() in netfs_read_remote_i_size() to + * ensure changes related to inode size (such as page contents) are + * visible before we see the changed inode size. + */ + smp_store_release(&ictx->_remote_i_size, remote_i_size); +#endif +} + +/** + * netfs_read_zero_point - Read zero_point safely + * @inode: The inode to access + * + * Read zero_point safely without the potential for tearing on 32-bit + * arches. + * + * NOTE: in a 32bit arch with a preemptable kernel and an UP compile the + * i_size_read/write must be atomic with respect to the local cpu (unlike = with + * preempt disabled), but they don't need to be atomic with respect to oth= er + * cpus like in true SMP (so they need either to either locally disable irq + * around the read or for example on x86 they can be still implemented as a + * cmpxchg8b without the need of the lock prefix). For SMP compiles and 6= 4bit + * archs it makes no difference if preempt is enabled or not. + */ +static inline unsigned long long netfs_read_zero_point(const struct inode = *inode) +{ + struct netfs_inode *ictx =3D container_of(inode, struct netfs_inode, inod= e); + unsigned long long zero_point; + +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + unsigned int seq; + + do { + seq =3D read_seqcount_begin(&inode->i_size_seqcount); + zero_point =3D ictx->_zero_point; + } while (read_seqcount_retry(&inode->i_size_seqcount, seq)); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); + zero_point =3D ictx->_zero_point; + preempt_enable(); +#else + /* Pairs with smp_store_release() in netfs_write_zero_point() */ + zero_point =3D smp_load_acquire(&ictx->_zero_point); +#endif + return zero_point; +} + +/* + * netfs_write_zero_point - Set zero_point safely + * @inode: The inode to access + * @zero_point: The new value for the point beyond which the server has no= data + * + * Set zero_point safely without the potential for tearing on 32-bit arche= s. + * + * Context: The caller must hold inode->i_lock. + * + * NOTE: unlike netfs_read_zero_point(), netfs_write_zero_point() does need + * locking around it (normally i_rwsem), otherwise on 32bit/SMP an update = of + * i_size_seqcount can be lost, resulting in subsequent read calls spinning + * forever. + */ +static inline void netfs_write_zero_point(struct inode *inode, + unsigned long long zero_point) +{ + struct netfs_inode *ictx =3D netfs_inode(inode); + +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + write_seqcount_begin(&inode->i_size_seqcount); + ictx->_zero_point =3D zero_point; + write_seqcount_end(&inode->i_size_seqcount); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); + ictx->_zero_point =3D zero_point; + preempt_enable(); +#else + /* + * Pairs with smp_load_acquire() in netfs_read_zero_point() to + * ensure changes related to inode size (such as page contents) are + * visible before we see the changed inode size. + */ + smp_store_release(&ictx->_zero_point, zero_point); +#endif +} + +/** + * netfs_read_sizes - Read remote_i_size and zero_point safely + * @inode: The inode to access + * @i_size: Where to return the local file size. + * @remote_i_size: Where to return the size of the file on the server + * @zero_point: Where to return the the point beyond which the server has = no data + * + * Read remote_i_size and zero_point safely without the potential for tear= ing + * on 32-bit arches. + * + * NOTE: in a 32bit arch with a preemptable kernel and an UP compile the + * i_size_read/write must be atomic with respect to the local cpu (unlike = with + * preempt disabled), but they don't need to be atomic with respect to oth= er + * cpus like in true SMP (so they need either to either locally disable irq + * around the read or for example on x86 they can be still implemented as a + * cmpxchg8b without the need of the lock prefix). For SMP compiles and 6= 4bit + * archs it makes no difference if preempt is enabled or not. + */ +static inline void netfs_read_sizes(const struct inode *inode, + unsigned long long *i_size, + unsigned long long *remote_i_size, + unsigned long long *zero_point) +{ + const struct netfs_inode *ictx =3D container_of(inode, struct netfs_inode= , inode); +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + unsigned int seq; + + do { + seq =3D read_seqcount_begin(&inode->i_size_seqcount); + *i_size =3D inode->i_size; + *remote_i_size =3D ictx->_remote_i_size; + *zero_point =3D ictx->_zero_point; + } while (read_seqcount_retry(&inode->i_size_seqcount, seq)); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); + *i_size =3D inode->i_size; + *remote_i_size =3D ictx->_remote_i_size; + *zero_point =3D ictx->_zero_point; + preempt_enable(); +#else + /* Pairs with smp_store_release() in i_size_write() */ + *i_size =3D smp_load_acquire(&inode->i_size); + /* Pairs with smp_store_release() in netfs_write_remote_i_size() */ + *remote_i_size =3D smp_load_acquire(&ictx->_remote_i_size); + /* Pairs with smp_store_release() in netfs_write_zero_point() */ + *zero_point =3D smp_load_acquire(&ictx->_zero_point); +#endif +} + +/* + * netfs_write_sizes - Set i_size, remote_i_size and zero_point safely + * @inode: The inode to access + * @i_size: The new value for the local size of the file + * @remote_i_size: The new value for the size of the file on the server + * @zero_point: The new value for the point beyond which the server has no= data + * + * Set both remote_i_size and zero_point safely without the potential for + * tearing on 32-bit arches. + * + * Context: The caller must hold inode->i_lock. + * + * NOTE: unlike netfs_read_zero_point(), netfs_write_zero_point() does need + * locking around it (normally i_rwsem), otherwise on 32bit/SMP an update = of + * i_size_seqcount can be lost, resulting in subsequent read calls spinning + * forever. + */ +static inline void netfs_write_sizes(struct inode *inode, + unsigned long long i_size, + unsigned long long remote_i_size, + unsigned long long zero_point) +{ + struct netfs_inode *ictx =3D netfs_inode(inode); + +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + write_seqcount_begin(&inode->i_size_seqcount); + inode->i_size =3D i_size; + ictx->_remote_i_size =3D remote_i_size; + ictx->_zero_point =3D zero_point; + write_seqcount_end(&inode->i_size_seqcount); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); + inode->i_size =3D i_size; + ictx->_remote_i_size =3D remote_i_size; + ictx->_zero_point =3D zero_point; + preempt_enable(); +#else + /* + * Pairs with smp_load_acquire() in i_size_read(), + * netfs_read_remote_i_size() and netfs_read_zero_point() to ensure + * changes related to inode size (such as page contents) are visible + * before we see the changed inode size. + */ + smp_store_release(&inode->i_size, i_size); + smp_store_release(&ictx->_remote_i_size, remote_i_size); + smp_store_release(&ictx->_zero_point, zero_point); +#endif +} + /** * netfs_inode_init - Initialise a netfslib inode context * @ctx: The netfs inode to initialise @@ -488,8 +736,8 @@ static inline void netfs_inode_init(struct netfs_inode = *ctx, bool use_zero_point) { ctx->ops =3D ops; - ctx->remote_i_size =3D i_size_read(&ctx->inode); - ctx->zero_point =3D LLONG_MAX; + ctx->_remote_i_size =3D i_size_read(&ctx->inode); + ctx->_zero_point =3D LLONG_MAX; ctx->flags =3D 0; atomic_set(&ctx->io_count, 0); #if IS_ENABLED(CONFIG_FSCACHE) @@ -498,7 +746,7 @@ static inline void netfs_inode_init(struct netfs_inode = *ctx, mutex_init(&ctx->wb_lock); /* ->releasepage() drives zero_point */ if (use_zero_point) { - ctx->zero_point =3D ctx->remote_i_size; + ctx->_zero_point =3D ctx->_remote_i_size; mapping_set_release_always(ctx->inode.i_mapping); } } @@ -511,13 +759,40 @@ static inline void netfs_inode_init(struct netfs_inod= e *ctx, * * Inform the netfs lib that a file got resized so that it can adjust its = state. */ -static inline void netfs_resize_file(struct netfs_inode *ctx, loff_t new_i= _size, +static inline void netfs_resize_file(struct netfs_inode *ictx, + unsigned long long new_i_size, bool changed_on_server) { +#if BITS_PER_LONG=3D=3D32 && defined(CONFIG_SMP) + struct inode *inode =3D &ictx->inode; + + preempt_disable(); + write_seqcount_begin(&inode->i_size_seqcount); + if (changed_on_server) + ictx->_remote_i_size =3D new_i_size; + if (new_i_size < ictx->_zero_point) + ictx->_zero_point =3D new_i_size; + write_seqcount_end(&inode->i_size_seqcount); + preempt_enable(); +#elif BITS_PER_LONG=3D=3D32 && defined(CONFIG_PREEMPTION) + preempt_disable(); if (changed_on_server) - ctx->remote_i_size =3D new_i_size; - if (new_i_size < ctx->zero_point) - ctx->zero_point =3D new_i_size; + ictx->_remote_i_size =3D new_i_size; + if (new_i_size < ictx->_zero_point) + ictx->_zero_point =3D new_i_size; + preempt_enable(); +#else + /* + * Pairs with smp_load_acquire() in netfs_read_remote_i_size and + * netfs_read_zero_point() to ensure changes related to inode size + * (such as page contents) are visible before we see the changed inode + * size. + */ + if (changed_on_server) + smp_store_release(&ictx->_remote_i_size, new_i_size); + if (new_i_size < ictx->_zero_point) + smp_store_release(&ictx->_zero_point, new_i_size); +#endif } =20 /** From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2099D481FA0 for ; Tue, 12 May 2026 12:34:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589300; cv=none; b=Jwc5ppUd/YTOxLGyM60VlOBpmUD1sFtaa+Ulw1Lyi6W1IqYoa8RJ2DihWVTNQe+5BwB7TshWIeZNv/lnGazg2Who84ohYztdPyyIwJSS3v8pttj1Ly/E4R7CrAraG/pduBI9LjD/DBY4HFuzx6Dhc70qEeUpo5Ru95/v2s/ZPek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589300; c=relaxed/simple; bh=kH2BSvmx5Q1EY9F+RZfYnqXBcDlpAxhRJhPQmFL2Vos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ez7KvaKEHKyJTMOOufLJ+ig/gQSzU5Uq5EI1arm7aNUQSEqyBJd8u9m99vUkQ8IeFBe7m4G1oUYCJ1fNLNRVd7SocrEd13kEHg23Ps7oobK8ZX6Wr6cw2WLB4kzoVnr/ixKNg2FIkHoK1eOEEtmXxH1fpecmoydWMhJ7Qy87CJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iPQFqm3o; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iPQFqm3o" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589285; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dKFEhcvCtkYMIbqpl8EHYZ581sy2cf3WKNpQ3LTNaCY=; b=iPQFqm3o54r7utRZ/xwO8MXNnfP+iQQFRRiFvnffTRk088+OBy4rk2vK80V909mzwWJDhj +2xJwWCeyGW7HzOYecdv6REDZecX2ixx3iNkx1Oq6iRMDAUn/sUjn7UuTzjmeYX0z0yrmR yUkUwOkO+R4Z8kkN36YGNs/ieZaNmOw= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-669-TDHRvqUWNCubhmtN7ABLcQ-1; Tue, 12 May 2026 08:34:42 -0400 X-MC-Unique: TDHRvqUWNCubhmtN7ABLcQ-1 X-Mimecast-MFC-AGG-ID: TDHRvqUWNCubhmtN7ABLcQ_1778589280 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3E986195608A; Tue, 12 May 2026 12:34:40 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 29F2C180034E; Tue, 12 May 2026 12:34:36 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 06/24] netfs: Fix zeropoint update where i_size > remote_i_size Date: Tue, 12 May 2026 13:33:43 +0100 Message-ID: <20260512123404.719402-7-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Fix the update of the zero point[*] by netfs_release_folio() when there is uncommitted data in the pagecache beyond the folio being released but the on-server EOF is in this folio (ie. i_size > remote_i_size). The update needs to limit zero_point to remote_i_size, not i_size as i_size is a local phenomenon reflecting updates made locally to the pagecache, not stuff written to the server. remote_i_size tracks the server's i_size. [*] The zero point is the file position from which we can assume that the server will just return zeros, so we can avoid generating reads. Note that netfs_invalidate_folio() probably doesn't need fixing as zero_point should be updated by setattr after truncation or fallocate. Found with: fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \ /xfstest.test/junk --replay-ops=3Djunk.fsxops using the following as junk.fsxops: truncate 0x0 0x1bbae 0x82864 write 0x3ef2e 0xf9c8 0x1bbae write 0x67e05 0xcb5a 0x4e8f6 mapread 0x57781 0x85b6 0x7495f copy_range 0x5d3d 0x10329 0x54fac 0x7495f write 0x64710 0x1c2b 0x7495f mapread 0x64000 0x1000 0x7495f on cifs with the default cache option. It shows read-gaps on folio 0x64 failing with a short read (ie. it hits EOF) if the FMODE_READ check is commented out in netfs_perform_write(): if (//(file->f_mode & FMODE_READ) || netfs_is_cache_enabled(ctx)) { and no fscache. This was initially found with the generic/522 xfstest. Fixes: cce6bfa6ca0e ("netfs: Fix trimming of streaming-write folios in netf= s_inval_folio()") Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/misc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c index bad661ff2bec..723571ca1b88 100644 --- a/fs/netfs/misc.c +++ b/fs/netfs/misc.c @@ -307,10 +307,10 @@ bool netfs_release_folio(struct folio *folio, gfp_t g= fp) return false; =20 netfs_read_sizes(inode, &i_size, &remote_i_size, &zero_point); - end =3D umin(folio_next_pos(folio), i_size); + end =3D folio_next_pos(folio); if (end > zero_point) { spin_lock(&inode->i_lock); - end =3D umin(folio_next_pos(folio), inode->i_size); + end =3D umin(end, ctx->_remote_i_size); if (end > ctx->_zero_point) netfs_write_zero_point(inode, end); spin_unlock(&inode->i_lock); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 860F34F7971 for ; Tue, 12 May 2026 12:34:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589302; cv=none; b=PI7isGjXV71RT9+9yzCFS4Hu2qKJk645ZxJ6ORkCpBdDIIgTM83kAGQVac1AIaoMMb7xKk4t+xKm6DH58DnBscdpVQ9JGHoijtXrVRg+LJS6ZYfpFYa7oI5+C2epCuu5dwN9/mD1lbKIr9vzRKbHRxZvSQSeDasu1lieXMwCYfE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589302; c=relaxed/simple; bh=uiSuswE0HtjaEtg1ZPUSz30QNz2KPnD1R4oUG15gXaQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cvYxv8GBCab3oNj4WMg4r1MJzCnV5+kPeVV0n4Gm0O0tGJyUojRWMfeFI0wKBkyWMT47FFrU5t55P4PeqX0mnMKHosqfY6Pzm7SCVfL+2TFM395ucnJ7bqO+ahmV+IpmcoBftL6g+X+ge7Y5QwbrjEViiZFGh1hUErHQEJOmCSQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=f0GE0zRU; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="f0GE0zRU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fn5MwR0bkk2sYoCxMUvbMgZpxH4BijhOwUaadlW8+Aw=; b=f0GE0zRUkNZ3Y9+F2j3+bqx3U7Xhq44rqdYJyKn+O7ZgTug9/VnCyQJ7qBi5sRbAPoErqj qOWBCbHmeXSJrpGtyMrVMfaQkKi0YkIdLvuHgXA5gdvvDsyTWsBSlGab6oqf+YQSMVvxGI MyINlUkUcswioAZmiksw/BwVLc2JeOU= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-613-fJVcTSfEM0qeaZUqxDqt-A-1; Tue, 12 May 2026 08:34:46 -0400 X-MC-Unique: fJVcTSfEM0qeaZUqxDqt-A-1 X-Mimecast-MFC-AGG-ID: fJVcTSfEM0qeaZUqxDqt-A_1778589285 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E77791956055; Tue, 12 May 2026 12:34:44 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DF488180034E; Tue, 12 May 2026 12:34:41 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Viacheslav Dubeyko Subject: [PATCH v6 07/24] netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call Date: Tue, 12 May 2026 13:33:44 +0100 Message-ID: <20260512123404.719402-8-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" From: Viacheslav Dubeyko The multiple runs of generic/013 test-case is capable to reproduce a kernel BUG at mm/filemap.c:1504 with probability of 30%. while true; do sudo ./check generic/013 done [ 9849.452376] page: refcount:3 mapcount:0 mapping:00000000e58ff252 index:0= x10781 pfn:0x1c322 [ 9849.452412] memcg:ffff8881a1915800 [ 9849.452417] aops:ceph_aops ino:1000058db9e dentry name(?):"f9XXXXXX" [ 9849.452432] flags: 0x17ffffc0000000(node=3D0|zone=3D2|lastcpupid=3D0x1ff= fff) [ 9849.452441] raw: 0017ffffc0000000 0000000000000000 dead000000000122 ffff= 88816110d248 [ 9849.452445] raw: 0000000000010781 0000000000000000 00000003ffffffff ffff= 8881a1915800 [ 9849.452447] page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(foli= o)) [ 9849.452474] ------------[ cut here ]------------ [ 9849.452476] kernel BUG at mm/filemap.c:1504! [ 9849.478635] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 9849.481772] CPU: 2 UID: 0 PID: 84223 Comm: fsstress Not tainted 7.0.0-rc= 1+ #18 PREEMPT(full) [ 9849.482881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS = 1.17.0-9.fc43 06/1 0/2025 [ 9849.484539] RIP: 0010:folio_unlock+0x85/0xa0 [ 9849.485076] Code: 89 df 31 f6 e8 1c f3 ff ff 48 8b 5d f8 c9 31 c0 31 d2 = 31 f6 31 ff c3 cc cc cc cc 48 c7 c6 80 6c d9 a7 48 89 df e8 4b b3 10 00 <0f> 0b 48 89 df e8 2= 1 e6 2c 00 eb 9d 0f 1f 40 00 66 66 2e 0f 1f 84 [ 9849.493818] RSP: 0018:ffff8881bb8076b0 EFLAGS: 00010246 [ 9849.495740] RAX: 0000000000000000 RBX: ffffea00070c8980 RCX: 00000000000= 00000 [ 9849.498678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000= 00000 [ 9849.500559] RBP: ffff8881bb8076b8 R08: 0000000000000000 R09: 00000000000= 00000 [ 9849.501097] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000107= 82000 [ 9849.502108] R13: ffff8881935de738 R14: ffff88816110d010 R15: 00000000000= 01000 [ 9849.502516] FS: 00007e36cbe94740(0000) GS:ffff88824a899000(0000) knlGS:= 0000000000000000 [ 9849.502996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9849.503810] CR2: 000000c0002b0000 CR3: 000000011bbf6004 CR4: 00000000007= 72ef0 [ 9849.504459] PKRU: 55555554 [ 9849.504626] Call Trace: [ 9849.505242] [ 9849.505379] netfs_write_begin+0x7c8/0x10a0 [ 9849.505877] ? __kasan_check_read+0x11/0x20 [ 9849.506384] ? __pfx_netfs_write_begin+0x10/0x10 [ 9849.507178] ceph_write_begin+0x8c/0x1c0 [ 9849.507934] generic_perform_write+0x391/0x8f0 [ 9849.508503] ? __pfx_generic_perform_write+0x10/0x10 [ 9849.509062] ? file_update_time_flags+0x19a/0x4b0 [ 9849.509581] ? ceph_get_caps+0x63/0xf0 [ 9849.510259] ? ceph_get_caps+0x63/0xf0 [ 9849.510530] ceph_write_iter+0xe79/0x1ae0 [ 9849.511282] ? __pfx_ceph_write_iter+0x10/0x10 [ 9849.511839] ? lock_acquire+0x1ad/0x310 [ 9849.512334] ? ksys_write+0xf9/0x230 [ 9849.512582] ? lock_is_held_type+0xaa/0x140 [ 9849.513128] vfs_write+0x512/0x1110 [ 9849.513634] ? __fget_files+0x33/0x350 [ 9849.513893] ? __pfx_vfs_write+0x10/0x10 [ 9849.514143] ? mutex_lock_nested+0x1b/0x30 [ 9849.514394] ksys_write+0xf9/0x230 [ 9849.514621] ? __pfx_ksys_write+0x10/0x10 [ 9849.514887] ? do_syscall_64+0x25e/0x1520 [ 9849.515122] ? __kasan_check_read+0x11/0x20 [ 9849.515366] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.515655] __x64_sys_write+0x72/0xd0 [ 9849.515885] ? trace_hardirqs_on+0x24/0x1c0 [ 9849.516130] x64_sys_call+0x22f/0x2390 [ 9849.516341] do_syscall_64+0x12b/0x1520 [ 9849.516545] ? do_syscall_64+0x27c/0x1520 [ 9849.516783] ? do_syscall_64+0x27c/0x1520 [ 9849.517003] ? lock_release+0x318/0x480 [ 9849.517220] ? __x64_sys_io_getevents+0x143/0x2d0 [ 9849.517479] ? percpu_ref_put_many.constprop.0+0x8f/0x210 [ 9849.517779] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9849.518073] ? do_syscall_64+0x25e/0x1520 [ 9849.518291] ? __kasan_check_read+0x11/0x20 [ 9849.518519] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.518799] ? do_syscall_64+0x27c/0x1520 [ 9849.519024] ? local_clock_noinstr+0xf/0x120 [ 9849.519262] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9849.519544] ? do_syscall_64+0x25e/0x1520 [ 9849.519781] ? __kasan_check_read+0x11/0x20 [ 9849.520008] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.520273] ? do_syscall_64+0x27c/0x1520 [ 9849.520491] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.520767] ? irqentry_exit+0x10c/0x6c0 [ 9849.520984] ? trace_hardirqs_off+0x86/0x1b0 [ 9849.521224] ? exc_page_fault+0xab/0x130 [ 9849.521472] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9849.521766] RIP: 0033:0x7e36cbd14907 [ 9849.521989] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f = 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48= > 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 9849.523057] RSP: 002b:00007ffff2d2a968 EFLAGS: 00000246 ORIG_RAX: 000000= 0000000001 [ 9849.523484] RAX: ffffffffffffffda RBX: 000000000000e549 RCX: 00007e36cbd= 14907 [ 9849.523885] RDX: 000000000000e549 RSI: 00005bd797ec6370 RDI: 00000000000= 00004 [ 9849.524277] RBP: 0000000000000004 R08: 0000000000000047 R09: 00005bd797e= c6370 [ 9849.524652] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000= 00049 [ 9849.525062] R13: 0000000010781a37 R14: 00005bd797ec6370 R15: 00000000000= 00000 [ 9849.525447] [ 9849.525574] Modules linked in: intel_rapl_msr intel_rapl_common intel_un= core_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class = intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass ghash_c= lmulni_intel aesni_intel input_leds rapl mac_hid psmouse vga16fb serio_raw = vgastate floppy i2c_piix4 bochs qemu_fw_cfg i2c_smbus pata_acpi sch_fq_code= l rbd msr parport_pc ppdev lp parport efi_pstore [ 9849.529150] ---[ end trace 0000000000000000 ]--- [ 9849.529502] RIP: 0010:folio_unlock+0x85/0xa0 [ 9849.530813] Code: 89 df 31 f6 e8 1c f3 ff ff 48 8b 5d f8 c9 31 c0 31 d2 = 31 f6 31 ff c3 cc cc cc cc 48 c7 c6 80 6c d9 a7 48 89 df e8 4b b3 10 00 <0f= > 0b 48 89 df e8 21 e6 2c 00 eb 9d 0f 1f 40 00 66 66 2e 0f 1f 84 [ 9849.534986] RSP: 0018:ffff8881bb8076b0 EFLAGS: 00010246 [ 9849.536198] RAX: 0000000000000000 RBX: ffffea00070c8980 RCX: 00000000000= 00000 [ 9849.537718] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000= 00000 [ 9849.539321] RBP: ffff8881bb8076b8 R08: 0000000000000000 R09: 00000000000= 00000 [ 9849.540862] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000107= 82000 [ 9849.542438] R13: ffff8881935de738 R14: ffff88816110d010 R15: 00000000000= 01000 [ 9849.543996] FS: 00007e36cbe94740(0000) GS:ffff88824b899000(0000) knlGS:= 0000000000000000 [ 9849.545854] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9849.547092] CR2: 00007e36cb3ff000 CR3: 000000011bbf6006 CR4: 00000000007= 72ef0 [ 9849.548679] PKRU: 55555554 The race sequence: 1. Read completes -> netfs_read_collection() runs 2. netfs_wake_rreq_flag(rreq, NETFS_RREQ_IN_PROGRESS, ...) 3. netfs_wait_for_read() returns -EFAULT to netfs_write_begin() 4. The netfs_unlock_abandoned_read_pages() unlocks the folio 5. netfs_write_begin() calls folio_unlock(folio) -> VM_BUG_ON_FOLIO() The key reason of the issue that netfs_unlock_abandoned_read_pages() doesn't check the flag NETFS_RREQ_NO_UNLOCK_FOLIO and executes folio_unlock() unconditionally. This patch implements in netfs_unlock_abandoned_read_pages() logic similar to netfs_unlock_read_folio(). Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Signed-off-by: Viacheslav Dubeyko Signed-off-by: David Howells Reviewed-by: Paulo Alcantara (Red Hat) cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: Ceph Development --- fs/netfs/read_retry.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c index 5ec548b996d6..e10eb5a07332 100644 --- a/fs/netfs/read_retry.c +++ b/fs/netfs/read_retry.c @@ -292,8 +292,15 @@ void netfs_unlock_abandoned_read_pages(struct netfs_io= _request *rreq) struct folio *folio =3D folioq_folio(p, slot); =20 if (folio && !folioq_is_marked2(p, slot)) { - trace_netfs_folio(folio, netfs_folio_trace_abandon); - folio_unlock(folio); + if (folio->index =3D=3D rreq->no_unlock_folio && + test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, + &rreq->flags)) { + _debug("no unlock"); + } else { + trace_netfs_folio(folio, + netfs_folio_trace_abandon); + folio_unlock(folio); + } } } } From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E947D388882 for ; Tue, 12 May 2026 12:35:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589310; cv=none; b=JUFnx+RavLA8tBUIuQhbF89H+guey+JTiCJhfQHXMBrZP8s914n4PLJF0lCaKgnjKrEZnEUR5r+T+gfk5RrsqmBOeEOmG8c5VljXEtipSLjrj9Y5JJe2ZDF278EiUsRcCwtVqSy4N1da4b+ILR/dThjCgxeRA4c3CSLTKI9czyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589310; c=relaxed/simple; bh=sxih3BTDZzJ2bsUF2D6f64sfz9xkArtTKlvgnT0LUh0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IKc7jdoG1D4HpTnKZIGeT6LU5m1icjMfLmMCY+GbNkXMwuiSS8ORnR3VJ0ToAERQIn5BM8/BsqyAJtEDD45tS4WSfXwPhTMD5RIL96ZhsUfB2WFVZ95LuVAHeiOkjtfi8LDQdAkdNMpf1X04CzsoJjTX4d9xnM2/zZl6H2iAuWY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=aUMN8/H9; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aUMN8/H9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MAmZUPUCJuuZNr89UZiegqkhZYwnj4y64aSMrWWXqcs=; b=aUMN8/H9y8edC5nz+RaZhAttdvuinx/dVI/anoIen5qHu5ysX6NpQwnmlNrAef8lIvyUrf eFpMurqiS6/I1OrupAQYrh4PjNczSDYakuK3t7AWVnUq4Y+GqLQ7WyAmUppOdI90zP0a5v 1ID8ELsEgXTcKdXSWv9LlMMOmjL/UQE= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-208-piGjhrQqNFu2y0lHo3mczQ-1; Tue, 12 May 2026 08:34:51 -0400 X-MC-Unique: piGjhrQqNFu2y0lHo3mczQ-1 X-Mimecast-MFC-AGG-ID: piGjhrQqNFu2y0lHo3mczQ_1778589290 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 08ACC19560A1; Tue, 12 May 2026 12:34:50 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7509319560A6; Tue, 12 May 2026 12:34:46 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 08/24] netfs: Fix potential uninitialised var in netfs_extract_user_iter() Date: Tue, 12 May 2026 13:33:45 +0100 Message-ID: <20260512123404.719402-9-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" In netfs_extract_user_iter(), if it's given a zero-length iterator, it will fall through the loop without setting ret, and so the error handling behaviour will be undefined, depending on whether ret happens to be negative. The value of ret then propagates back up the callstack. Fix this by presetting ret to 0. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into= a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/iterator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index 154a14bb2d7f..6903028b7162 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -43,7 +43,7 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, si= ze_t orig_len, unsigned int max_pages; unsigned int npages =3D 0; unsigned int i; - ssize_t ret; + ssize_t ret =3D 0; size_t count =3D orig_len, offset, len; size_t bv_size, pg_size; From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D360F4A3412 for ; Tue, 12 May 2026 12:35:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589313; cv=none; b=MVzP1z2annO6DMGzeFgZ/UYg6JfKDnzors8b7f1O1taRBBeg85dH2cgsh9+woHYMYiuUjEOs22ud9WX5V8Ie/xJT5pGBnS8UWoQPm2GQ68n+qPXhYB2S7BRM/cl/iWNcpKXOi2RdmtY9x0sn4v0CoTBgqHT40LC+6no3c1v0XxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589313; c=relaxed/simple; bh=853CF/ipdIOZWSBvml99qDygptnRkJVXQwOPCjZS9T4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A9Brj0KFqeVxgG16czyS4StngJPcFoBq4467if2VMoyKLaSvH3UNc2bT3157U1JEXstf7/cffLpNyAOXOIav1KQN/i11LxnsJMG/HeEmp1pBtsWnHDI1Axcb5s++hU9f/+brl8SKLmPlCNtjXCF0NfY1PsNF2TBvpZf6V0ziIzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WulKnF66; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WulKnF66" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gap7WXFqPLGHG+h8WlKiOQ50JtpUJcaaxJqkfmLdmEA=; b=WulKnF66h8e3ifUUxgdzlHj86Tm5KSUAbolMZ8BOt0h8vDFSvQUTg+0lbuTY4E4fXKTC/q qVOXK77HP29jXo2eXV9UaykaE9MBk5ycQoqpHWyqx301/bq4bEvVlXk4xuY21By1uz/uae EfiP7+UxoXS7l5LJrlqdpNOP6DJe2qA= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-164-v1BCWvwDO-aUdYWifMv6FQ-1; Tue, 12 May 2026 08:34:56 -0400 X-MC-Unique: v1BCWvwDO-aUdYWifMv6FQ-1 X-Mimecast-MFC-AGG-ID: v1BCWvwDO-aUdYWifMv6FQ_1778589295 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1F46119560B1; Tue, 12 May 2026 12:34:55 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BE0AE180034E; Tue, 12 May 2026 12:34:51 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Xiaoli Feng , stable@vger.kernel.org Subject: [PATCH v6 09/24] netfs: fix error handling in netfs_extract_user_iter() Date: Tue, 12 May 2026 13:33:46 +0100 Message-ID: <20260512123404.719402-10-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" From: Paulo Alcantara In netfs_extract_user_iter(), if iov_iter_extract_pages() failed to extract user pages, bail out on -ENOMEM, otherwise return the error code only if @npages =3D=3D 0, allowing short DIO reads and writes to be issued. This fixes mmapstress02 from LTP tests against CIFS. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into= a BVEC iterator") Reported-by: Xiaoli Feng Signed-off-by: Paulo Alcantara (Red Hat) Signed-off-by: David Howells Cc: netfs@lists.linux.dev Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org --- fs/netfs/iterator.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index 6903028b7162..429e4396e1b0 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -22,7 +22,7 @@ * * Extract the page fragments from the given amount of the source iterator= and * build up a second iterator that refers to all of those bits. This allo= ws - * the original iterator to disposed of. + * the original iterator to be disposed of. * * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-pee= r DMA be * allowed on the pages extracted. @@ -67,8 +67,8 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, si= ze_t orig_len, ret =3D iov_iter_extract_pages(orig, &pages, count, max_pages - npages, extraction_flags, &offset); - if (ret < 0) { - pr_err("Couldn't get user pages (rc=3D%zd)\n", ret); + if (unlikely(ret <=3D 0)) { + ret =3D ret ?: -EIO; break; } =20 @@ -97,6 +97,13 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, s= ize_t orig_len, npages +=3D cur_npages; } =20 + if (ret < 0 && (ret =3D=3D -ENOMEM || npages =3D=3D 0)) { + for (i =3D 0; i < npages; i++) + unpin_user_page(bv[i].bv_page); + kvfree(bv); + return ret; + } + iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count); return npages; } From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A5BF37DAB2 for ; Tue, 12 May 2026 12:35:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589318; cv=none; b=MUrFr+B4ouHU+MzCG9xhE3tlE4vScVyHltt4RPDhbX+qPOKIEepP6zh1BfWumY9lQLnfT/CHF2Scq4xPvgiKTwRN8hONeXwsRRHK3dDuiyi8M9pbHBwNGJZpoA34Mmwy27rO2B6R/+TQMX10+qRBF3V9btWaJ5/oLrWDcyU3vrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589318; c=relaxed/simple; bh=3YvfC4SFqq5CAax8989wspHy9qRmH43JzGvB3HLN65o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J7qQP+W1E3oxaDZJOH/VeBRpv/QgQV0L9djwKOqV2EugvdjVD9LytIVeFErL/MrQUTWEuZlJ7wqof9acrWrmZQ3g4Afd1qQd7cytCkvct1BZl0WEhm8p0VLXytABLZGdrwmlhf1NfQaJeo7hf6QfqpTVjartMnkeKHBS/jWLAi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HzWgNRdC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HzWgNRdC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XqTDgxS92higoqL65+640Zno11P+YXpAqTm+1gBq9A0=; b=HzWgNRdCMhFiFAhAFSsKmC5A07jAevJydyrMAXv80hfLyvA1KuJEINx5loAm/sUmMfOZ48 6JI+Hy/4OQJQe8ijXQ8HEl/v/5HQkOr3VIajnmQ3uwtmRowDGLmQlO/Gen/rQSYw6asZg9 711DQpOzJhRMFHvfgN1d74fi/XE1+Rk= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-198-r0kVuVHoNG68j78FkHt2fg-1; Tue, 12 May 2026 08:35:01 -0400 X-MC-Unique: r0kVuVHoNG68j78FkHt2fg-1 X-Mimecast-MFC-AGG-ID: r0kVuVHoNG68j78FkHt2fg_1778589299 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 85F6B195607B; Tue, 12 May 2026 12:34:59 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CF68419560A2; Tue, 12 May 2026 12:34:56 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 10/24] netfs: Fix overrun check in netfs_extract_user_iter() Date: Tue, 12 May 2026 13:33:47 +0100 Message-ID: <20260512123404.719402-11-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Fix netfs_extract_user_iter() so that if iov_iter_extract_pages() overfills pages[], then those pages don't get included in the iterator constructed at the end of the function. If there was an overfill, memory corruption has already happened. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into= a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260427154639.180684-1-dhowells%40r= edhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/iterator.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index 429e4396e1b0..b375567e0520 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -72,21 +72,24 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, = size_t orig_len, break; } =20 - if (ret > count) { - pr_err("get_pages rc=3D%zd more than %zu\n", ret, count); + if (WARN(ret > count, + "%s: extract_pages overrun %zd > %zu bytes\n", + __func__, ret, count)) { + ret =3D -EIO; break; } =20 - count -=3D ret; - ret +=3D offset; - cur_npages =3D DIV_ROUND_UP(ret, PAGE_SIZE); - - if (npages + cur_npages > max_pages) { - pr_err("Out of bvec array capacity (%u vs %u)\n", - npages + cur_npages, max_pages); + cur_npages =3D DIV_ROUND_UP(offset + ret, PAGE_SIZE); + if (WARN(cur_npages > max_pages - npages, + "%s: extract_pages overrun %u > %u pages\n", + __func__, npages + cur_npages, max_pages)) { + ret =3D -EIO; break; } =20 + count -=3D ret; + ret +=3D offset; + for (i =3D 0; i < cur_npages; i++) { len =3D ret > PAGE_SIZE ? PAGE_SIZE : ret; bvec_set_page(bv + npages + i, *pages++, len - offset, offset); @@ -97,6 +100,11 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, = size_t orig_len, npages +=3D cur_npages; } =20 + /* Note: Don't try to clean up after EIO. Either we got no pages, so + * nothing to clean up, or we got a buffer overrun, memory corruption + * and can't trust the stuff in the buffer (a WARN was emitted). + */ + if (ret < 0 && (ret =3D=3D -ENOMEM || npages =3D=3D 0)) { for (i =3D 0; i < npages; i++) unpin_user_page(bv[i].bv_page); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AD373672B9 for ; Tue, 12 May 2026 12:35:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589323; cv=none; b=gPKYRWXwl5lN3LFKaxFcosqKsS61nXtn2GQBgQeFuVpVhirCNCMRn0YeUx3Hz2nNmt/nddbfxET9WAONxy0OJYDBXIV648f55YZgQU+AL9z7RjxSDq4yRlTvyt9ppT4acbsU0Z2Tr+WyLAQHkyLuJOZjOFXLnYNlBVJ1COXL0RI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589323; c=relaxed/simple; bh=aT+cFasVFMB2oAShjKMPYVQBw9ncI+zRRn/9uYX3msE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PYG3XNDoto/0qYv8sNPvm29bc/CbCFCzwC7ajJHbG/XFpan/F0VJLAzVeQOR+DRQozdEjuBRbJGaKbY3I9uqmIAHnRWx2GSmugbQQvw0r2tkb3/houRV+tE0zVO20GUkSYfycKgJki9sFt/ip68Hr40JSLzpa0jRV05zn0EKqkc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S7/tdLR8; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S7/tdLR8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VkuyHj6lQaHKlWIeolrBlUXqKXoVflWqXfIrCs1c0R0=; b=S7/tdLR8UdgDdNMq4dxpmYq5dgSyT8xm88Ztj5LC//5kEXzvzQDEEe/vDnNwkuAUXbaGg4 wCjzOeetXvhgIj82WlYWor8E99TLNyb82cpUJh2RJiVOuSh9raIu8A+Mnmlrjz0PLRhfdA Lj4FePDIAxFCc6q/264UXjqvLaetorU= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-31-MitLMzY8N8mLuOlhI3ewzA-1; Tue, 12 May 2026 08:35:06 -0400 X-MC-Unique: MitLMzY8N8mLuOlhI3ewzA-1 X-Mimecast-MFC-AGG-ID: MitLMzY8N8mLuOlhI3ewzA_1778589305 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8AF6A1956061; Tue, 12 May 2026 12:35:04 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4BC601955D84; Tue, 12 May 2026 12:35:00 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Marc Dionne , Matthew Wilcox Subject: [PATCH v6 11/24] netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone Date: Tue, 12 May 2026 13:33:48 +0100 Message-ID: <20260512123404.719402-12-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" If a streaming write is made, this will leave the relevant modified folio in a not-uptodate, but dirty state with a netfs_folio struct hung off of folio->private indicating the dirty range. Subsequently truncating the file such that the dirty data in the folio is removed, but the first part of the folio theoretically remains will cause the netfs_folio struct to be discarded... but will leave the dirty flag set. If the folio is then read via mmap(), netfs_read_folio() will see that the page is dirty and jump to netfs_read_gaps() to fill in the missing bits. netfs_read_gaps(), however, expects there to be a netfs_folio struct present and can oops because truncate removed it. Fix this by calling folio_cancel_dirty() in netfs_invalidate_folio() in the event that all the dirty data in the folio is erased (as nfs does). Also add some tracepoints to log modifications to a dirty page. This can be reproduced with something like: dd if=3D/dev/zero of=3D/xfstest.test/foo bs=3D1M count=3D1 umount /xfstest.test mount /xfstest.test xfs_io -c "w 0xbbbf 0xf96c" \ -c "truncate 0xbbbf" \ -c "mmap -r 0xb000 0x11000" \ -c "mr 0xb000 0x11000" \ /xfstest.test/foo with fscaching disabled (otherwise streaming writes are suppressed) and a change to netfs_perform_write() to disallow streaming writes if the fd is open O_RDWR: if (//(file->f_mode & FMODE_READ) || <--- comment this out netfs_is_cache_enabled(ctx)) { It should be reproducible even without this change, but if prevents the above trivial xfs_io command from reproducing it. Note that the initial dd is important: the file must start out sufficiently large that the zero-point logic doesn't just clear the gaps because it knows there's nothing in the file to read yet. Unmounting and mounting is needed to clear the pagecache (there are other ways to do that that may also work). This was initially reproduced with the generic/522 xfstest on some patches that remove the FMODE_READ restriction. Fixes: 9ebff83e6481 ("netfs: Prep to use folio->private for write grouping = and streaming write") Reported-by: Marc Dionne Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/misc.c | 6 +++++- include/trace/events/netfs.h | 4 ++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c index 723571ca1b88..24b20e80e9a8 100644 --- a/fs/netfs/misc.c +++ b/fs/netfs/misc.c @@ -263,6 +263,7 @@ void netfs_invalidate_folio(struct folio *folio, size_t= offset, size_t length) /* Move the start of the data. */ finfo->dirty_len =3D fend - iend; finfo->dirty_offset =3D offset; + trace_netfs_folio(folio, netfs_folio_trace_invalidate_front); return; } =20 @@ -271,12 +272,14 @@ void netfs_invalidate_folio(struct folio *folio, size= _t offset, size_t length) */ if (iend >=3D fend) { finfo->dirty_len =3D offset - fstart; + trace_netfs_folio(folio, netfs_folio_trace_invalidate_tail); return; } =20 /* A partial write was split. The caller has already zeroed * it, so just absorb the hole. */ + trace_netfs_folio(folio, netfs_folio_trace_invalidate_middle); } return; =20 @@ -284,8 +287,9 @@ void netfs_invalidate_folio(struct folio *folio, size_t= offset, size_t length) netfs_put_group(netfs_folio_group(folio)); folio_detach_private(folio); folio_clear_uptodate(folio); + folio_cancel_dirty(folio); kfree(finfo); - return; + trace_netfs_folio(folio, netfs_folio_trace_invalidate_all); } EXPORT_SYMBOL(netfs_invalidate_folio); =20 diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 8c936fc575d5..0b702f74aefe 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -194,6 +194,10 @@ EM(netfs_folio_trace_copy_to_cache, "mark-copy") \ EM(netfs_folio_trace_end_copy, "end-copy") \ EM(netfs_folio_trace_filled_gaps, "filled-gaps") \ + EM(netfs_folio_trace_invalidate_all, "inval-all") \ + EM(netfs_folio_trace_invalidate_front, "inval-front") \ + EM(netfs_folio_trace_invalidate_middle, "inval-mid") \ + EM(netfs_folio_trace_invalidate_tail, "inval-tail") \ EM(netfs_folio_trace_kill, "kill") \ EM(netfs_folio_trace_kill_cc, "kill-cc") \ EM(netfs_folio_trace_kill_g, "kill-g") \ From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76DCF306748 for ; Tue, 12 May 2026 12:35:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589328; cv=none; b=Yknbqc5CxBYc/bJzAP0aox+63psYAjBVMR8Bw5QAs3S8ZFlVqwh4u1D8QS7jBJIjqrcyY/koHPyKIQVZn1i9BWeS9QbAslaXa/jchxh4zpokvJ0IWX5Ju/FDy2bK/roTA6DDMZImHxLvGdl7cPCbuQGxnAW1W2nolOxQ0upYBK0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589328; c=relaxed/simple; bh=KeUasABkgEn3NQcjVtDzHTfFRZyrIiFqog4yscjBZRc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=awnebnC7WaBSvC4GNfEjn7EtMvR2u/y3mEJdH36oad2+8jbUClOC7Qf+fQLeMpH+hJLPmerCLmgxnzsp0z8BEWnMa+TG+fM2NYAlVKojQPPyq8TisnxHeRVgcsrofyjbm6+e7gmhqqnubxy6l4xNM1EMTKZE7kBbniqTZOepruk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gBIbh7DH; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gBIbh7DH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589314; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0gmHNNP6/GH74ADWZurqVXHN7qp3bddGwDZsoBT42iI=; b=gBIbh7DHDvyClJT9/H7rcrhuAux/9IeWO6YjdQVs5u8O1xTAFNcsCb6ukr8vEHgKs/DNIS R1nYXY0F7BeyNA9gnTqq3+j1NFu0hOvXsvJFEIfPkStSRtasy4Wm3qvm+r4hXTe1tzosam nJQbAxJmqy0zo9IyhHYveKOpuz5JstU= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-478-q_3PvK5QP3u9LJ3SzEBl9g-1; Tue, 12 May 2026 08:35:11 -0400 X-MC-Unique: q_3PvK5QP3u9LJ3SzEBl9g-1 X-Mimecast-MFC-AGG-ID: q_3PvK5QP3u9LJ3SzEBl9g_1778589309 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 47C2F1800610; Tue, 12 May 2026 12:35:09 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 35AE530001BB; Tue, 12 May 2026 12:35:05 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 12/24] netfs: Defer the emission of trace_netfs_folio() Date: Tue, 12 May 2026 13:33:49 +0100 Message-ID: <20260512123404.719402-13-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" Change netfs_perform_write() to keep the netfs_folio trace value in a variable and emit it later to make it easier to choose the value displayed. This is a prerequisite for a subsequent patch. Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_write.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index b6ecd059dc4f..278aeb074e75 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -149,6 +149,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, } =20 do { + enum netfs_folio_trace trace; struct netfs_folio *finfo; struct netfs_group *group; unsigned long long fpos; @@ -222,7 +223,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, if (unlikely(copied =3D=3D 0)) goto copy_failed; netfs_set_group(folio, netfs_group); - trace_netfs_folio(folio, netfs_folio_is_uptodate); + trace =3D netfs_folio_is_uptodate; goto copied; } =20 @@ -238,7 +239,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, folio_zero_segment(folio, offset + copied, flen); __netfs_set_group(folio, netfs_group); folio_mark_uptodate(folio); - trace_netfs_folio(folio, netfs_modify_and_clear); + trace =3D netfs_modify_and_clear; goto copied; } =20 @@ -256,7 +257,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, } __netfs_set_group(folio, netfs_group); folio_mark_uptodate(folio); - trace_netfs_folio(folio, netfs_whole_folio_modify); + trace =3D netfs_whole_folio_modify; goto copied; } =20 @@ -283,7 +284,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, if (unlikely(copied =3D=3D 0)) goto copy_failed; netfs_set_group(folio, netfs_group); - trace_netfs_folio(folio, netfs_just_prefetch); + trace =3D netfs_just_prefetch; goto copied; } =20 @@ -297,7 +298,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, if (offset =3D=3D 0 && copied =3D=3D flen) { __netfs_set_group(folio, netfs_group); folio_mark_uptodate(folio); - trace_netfs_folio(folio, netfs_streaming_filled_page); + trace =3D netfs_streaming_filled_page; goto copied; } =20 @@ -312,7 +313,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, finfo->dirty_len =3D copied; folio_attach_private(folio, (void *)((unsigned long)finfo | NETFS_FOLIO_INFO)); - trace_netfs_folio(folio, netfs_streaming_write); + trace =3D netfs_streaming_write; goto copied; } =20 @@ -332,9 +333,9 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, folio_detach_private(folio); folio_mark_uptodate(folio); kfree(finfo); - trace_netfs_folio(folio, netfs_streaming_cont_filled_page); + trace =3D netfs_streaming_cont_filled_page; } else { - trace_netfs_folio(folio, netfs_streaming_write_cont); + trace =3D netfs_streaming_write_cont; } goto copied; } @@ -350,6 +351,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, continue; =20 copied: + trace_netfs_folio(folio, trace); flush_dcache_folio(folio); =20 /* Update the inode size if we moved the EOF marker */ From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57538306775 for ; Tue, 12 May 2026 12:35:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589335; cv=none; b=iJNgcSFS4dJrCuAeGlTnF9kA2lbAo3FUFBs594i5UyQfdeky13wVBVhG1N+RpBigA/S0mO9ZHeoBBGUoM10+UdSGIP+7OoK6rAbDvc97flO510qOTzz+Wk339zDgEbPc9GFrGVHi2DrNSKoimxTrm6ojOm9HBTNr345ETe0KwgM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589335; c=relaxed/simple; bh=UE4zIV0YAXrHVNHPQJYZO0ZSJwaTkjaMnwWY/r5PYqA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RD5TAm4rdq72uDuSchzNCo1e407z7SENCJVJtSi4OTz3YgPw/aK4EKSxeoG+qStZSK6Sjpbx+P/T8Y3Hfxymfk3ZBIq1hr8H85IJAD5K/WjvKpWdOZ3GFxfKhXp/zTZmbovM+TUWdcggiJXSFUc0gybRCC+qdizq89UDC1D3CJM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cQFGFd83; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cQFGFd83" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589321; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eyaICTHKp6jowDY0Echt9h/9zi2xk13bbyeUvASCiAE=; b=cQFGFd834RP9Z/Eh65/JAL2cgNxtW8Ip2JvrY2AihyFHhupNEri3Xo+9PgLARsotIgJCih 1XNlz1cmayuOgdmcfuHV5W27VLnUFlJyRYo2bF5KXlXdahP8hK/SZNyf/UgHaClYj7j5e0 14U3pTsA8Vt2urFfkYEURv8mquzKjE0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-261-GMefGmEjNpSmELHE0VzEgQ-1; Tue, 12 May 2026 08:35:15 -0400 X-MC-Unique: GMefGmEjNpSmELHE0VzEgQ-1 X-Mimecast-MFC-AGG-ID: GMefGmEjNpSmELHE0VzEgQ_1778589314 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1FEF01955DE2; Tue, 12 May 2026 12:35:14 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 15C0A1955D84; Tue, 12 May 2026 12:35:10 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 13/24] netfs: Fix streaming write being overwritten Date: Tue, 12 May 2026 13:33:50 +0100 Message-ID: <20260512123404.719402-14-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" In order to avoid reading whilst writing, netfslib will allow "streaming writes" in which dirty data is stored directly into folios without reading them first. Such folios are marked dirty but may not be marked uptodate. If a folio is entirely written by a streaming write, uptodate will be set, otherwise it will have a netfs_folio struct attached to ->private recording the dirty region. In the event that a partially written streaming write page is to be overwritten entirely by a single write(), netfs_perform_write() will try to copy over it, but doesn't discard the netfs_folio if it succeeds; further, it doesn't correctly handle a partial copy that overwrites some of the dirty data. Fix this by the following: (1) If the folio is successfully overwritten, free the netfs_folio struct before marking the page uptodate. (2) If the copy to the folio partially fails, but short of the dirty data, just ignore the copy. (3) If the copy partially fails and overwrites some of the dirty data, accept the copy, update the netfs_folio struct to record the new data. If the folio is now filled, free the netfs_folio and set uptodate, otherwise return a partial write. Found with: fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \ /xfstest.test/junk --replay-ops=3Djunk.fsxops using the following as junk.fsxops: truncate 0x0 0 0x927c0 write 0x63fb8 0x53c8 0 copy_range 0xb704 0x19b9 0x24429 0x79380 write 0x2402b 0x144a2 0x90660 * write 0x204d5 0x140a0 0x927c0 * copy_range 0x1f72c 0x137d0 0x7a906 0x927c0 * read 0x00000 0x20000 0x9157c read 0x20000 0x20000 0x9157c read 0x40000 0x20000 0x9157c read 0x60000 0x20000 0x9157c read 0x7e1a0 0xcfb9 0x9157c on cifs with the default cache option. It shows folio 0x24 misbehaving if the FMODE_READ check is commented out in netfs_perform_write(): if (//(file->f_mode & FMODE_READ) || netfs_is_cache_enabled(ctx)) { and no fscache. This was initially found with the generic/522 xfstest. Fixes: 8f52de0077ba ("netfs: Reduce number of conditional branches in netfs= _perform_write()") Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_write.c | 47 ++++++++++++++++++++++++++---------- include/trace/events/netfs.h | 3 +++ 2 files changed, 37 insertions(+), 13 deletions(-) diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index 278aeb074e75..991552724868 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -246,18 +246,38 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, /* See if we can write a whole folio in one go. */ if (!maybe_trouble && offset =3D=3D 0 && part >=3D flen) { copied =3D copy_folio_from_iter_atomic(folio, offset, part, iter); - if (unlikely(copied =3D=3D 0)) + if (likely(copied =3D=3D part)) { + if (finfo) { + trace =3D netfs_whole_folio_modify_filled; + goto folio_now_filled; + } + __netfs_set_group(folio, netfs_group); + folio_mark_uptodate(folio); + trace =3D netfs_whole_folio_modify; + goto copied; + } + if (copied =3D=3D 0) goto copy_failed; - if (unlikely(copied < part)) { + if (!finfo || copied <=3D finfo->dirty_offset) { maybe_trouble =3D true; iov_iter_revert(iter, copied); copied =3D 0; folio_unlock(folio); goto retry; } - __netfs_set_group(folio, netfs_group); - folio_mark_uptodate(folio); - trace =3D netfs_whole_folio_modify; + + /* We overwrote some existing dirty data, so we have to + * accept the partial write. + */ + finfo->dirty_len +=3D finfo->dirty_offset; + if (finfo->dirty_len =3D=3D flen) { + trace =3D netfs_whole_folio_modify_filled_efault; + goto folio_now_filled; + } + if (copied > finfo->dirty_len) + finfo->dirty_len =3D copied; + finfo->dirty_offset =3D 0; + trace =3D netfs_whole_folio_modify_efault; goto copied; } =20 @@ -327,16 +347,10 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, goto copy_failed; finfo->dirty_len +=3D copied; if (finfo->dirty_offset =3D=3D 0 && finfo->dirty_len =3D=3D flen) { - if (finfo->netfs_group) - folio_change_private(folio, finfo->netfs_group); - else - folio_detach_private(folio); - folio_mark_uptodate(folio); - kfree(finfo); trace =3D netfs_streaming_cont_filled_page; - } else { - trace =3D netfs_streaming_write_cont; + goto folio_now_filled; } + trace =3D netfs_streaming_write_cont; goto copied; } =20 @@ -350,6 +364,13 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct= iov_iter *iter, goto out; continue; =20 + folio_now_filled: + if (finfo->netfs_group) + folio_change_private(folio, finfo->netfs_group); + else + folio_detach_private(folio); + folio_mark_uptodate(folio); + kfree(finfo); copied: trace_netfs_folio(folio, trace); flush_dcache_folio(folio); diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 0b702f74aefe..aa9940ba307b 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -177,6 +177,9 @@ EM(netfs_folio_is_uptodate, "mod-uptodate") \ EM(netfs_just_prefetch, "mod-prefetch") \ EM(netfs_whole_folio_modify, "mod-whole-f") \ + EM(netfs_whole_folio_modify_efault, "mod-whole-f!") \ + EM(netfs_whole_folio_modify_filled, "mod-whole-f+") \ + EM(netfs_whole_folio_modify_filled_efault, "mod-whole-f+!") \ EM(netfs_modify_and_clear, "mod-n-clear") \ EM(netfs_streaming_write, "mod-streamw") \ EM(netfs_streaming_write_cont, "mod-streamw+") \ From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9002639734A for ; Tue, 12 May 2026 12:35:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589332; cv=none; b=uPNQmzGKQ5QSFWDGDITSk+VlCk6pOGrkke9+PDM1dafaExm8xR2gaJwfla+Drj5nnWFYIKbTLeJk5PxHTtaDfV/IFgdxxUZsc9MqUZOBmUh56TysbUggg28QdPOw58cq2j/MC4ZU2GJj+wIpmZTyy0aLYzUCDL8YLRhiCR9w2Vc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589332; c=relaxed/simple; bh=Ra6U6gVayYC75UIDtEgw+2ZMKy6gd1WCP5YcWQqvOqA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QIq03rr1AreFhNbZ3FgqwpefwxacRshnhKMnmWYtZkzLJVayokVsxEq9qf+4pXPhmYKSlUzNxIAiOeiIZgDtXAsBTlxn/N5QXZGDZEloyNR0ZKkLTMT9PIRXVhiWf55lpd6NMvTMcIpbz/5Qm0IgHOv3pwifnPpW5neIAQkDi7Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QmwdcXYc; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QmwdcXYc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589323; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OV31Ucptd7+GGbgBqJ/Bk1eVcVLZSLeP7GQONRl+DJ0=; b=QmwdcXYcZ846TKY4c7cBTyKmqj2OwH4m9Nh4Gjsb3y2MiYxogEL5dwbFRto8CRSksgoAIt 0X+15PliS0vwxI0eOK6UkJCnzUek16ZE+O+/kowSeoDscxa72ttEaSXCT8I46cHD7nl2II U4W80mGJ+OATDGri+gLgHhAhNLoxWYo= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-121-wJPqa3k-MquIX1UU--azBA-1; Tue, 12 May 2026 08:35:20 -0400 X-MC-Unique: wJPqa3k-MquIX1UU--azBA-1 X-Mimecast-MFC-AGG-ID: wJPqa3k-MquIX1UU--azBA_1778589318 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8653F195609F; Tue, 12 May 2026 12:35:18 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CF8441800591; Tue, 12 May 2026 12:35:15 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 14/24] netfs: Fix potential deadlock in write-through mode Date: Tue, 12 May 2026 13:33:51 +0100 Message-ID: <20260512123404.719402-15-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Fix netfs_advance_writethrough() to always unlock the supplied folio and to mark it dirty if it isn't yet written to the end. Unfortunately, it can't be marked for writeback until the folio is done with as that may cause a deadlock against mmapped reads and writes. Even though it has been marked dirty, premature writeback can't occur as the caller is holding both inode->i_rwsem (which will prevent concurrent truncation, fallocation, DIO and other writes) and ictx->wb_lock (which will cause flushing to wait and writeback to skip or wait). Note that this may be easier to deal with once the queuing of folios is split from the generation of subrequests. Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Closes: https://sashiko.dev/#/patchset/20260427154639.180684-1-dhowells%40r= edhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/write_issue.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c index b0e9690bb90c..03961622996b 100644 --- a/fs/netfs/write_issue.c +++ b/fs/netfs/write_issue.c @@ -414,12 +414,7 @@ static int netfs_write_folio(struct netfs_io_request *= wreq, if (streamw) netfs_issue_write(wreq, cache); =20 - /* Flip the page to the writeback state and unlock. If we're called - * from write-through, then the page has already been put into the wb - * state. - */ - if (wreq->origin =3D=3D NETFS_WRITEBACK) - folio_start_writeback(folio); + folio_start_writeback(folio); folio_unlock(folio); =20 if (fgroup =3D=3D NETFS_FOLIO_COPY_TO_CACHE) { @@ -647,29 +642,41 @@ int netfs_advance_writethrough(struct netfs_io_reques= t *wreq, struct writeback_c struct folio *folio, size_t copied, bool to_page_end, struct folio **writethrough_cache) { + int ret; + _enter("R=3D%x ic=3D%zu ws=3D%u cp=3D%zu tp=3D%u", wreq->debug_id, wreq->buffer.iter.count, wreq->wsize, copied, to_p= age_end); =20 - if (!*writethrough_cache) { - if (folio_test_dirty(folio)) - /* Sigh. mmap. */ - folio_clear_dirty_for_io(folio); + /* The folio is locked. */ =20 + if (*writethrough_cache !=3D folio) { + if (*writethrough_cache) { + /* Did the folio get moved? */ + folio_put(*writethrough_cache); + *writethrough_cache =3D NULL; + } /* We can make multiple writes to the folio... */ - folio_start_writeback(folio); if (wreq->len =3D=3D 0) trace_netfs_folio(folio, netfs_folio_trace_wthru); else trace_netfs_folio(folio, netfs_folio_trace_wthru_plus); *writethrough_cache =3D folio; + folio_get(folio); } =20 wreq->len +=3D copied; - if (!to_page_end) + + if (!to_page_end) { + folio_mark_dirty(folio); + folio_unlock(folio); return 0; + } =20 + ret =3D netfs_write_folio(wreq, wbc, folio); + folio_put(*writethrough_cache); *writethrough_cache =3D NULL; - return netfs_write_folio(wreq, wbc, folio); + wreq->submitted =3D wreq->len; + return ret; } =20 /* @@ -683,8 +690,12 @@ ssize_t netfs_end_writethrough(struct netfs_io_request= *wreq, struct writeback_c =20 _enter("R=3D%x", wreq->debug_id); =20 - if (writethrough_cache) + if (writethrough_cache) { + folio_lock(writethrough_cache); netfs_write_folio(wreq, wbc, writethrough_cache); + folio_put(writethrough_cache); + wreq->submitted =3D wreq->len; + } =20 netfs_end_issue_write(wreq); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28E5837DAB4 for ; Tue, 12 May 2026 12:35:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589339; cv=none; b=cmMSpbdUTocORWCt4yL84oYR967Aw2IvW5rpMAT1k7IIsNsgmiunQipTkToKw0XOrdoyENUGrsxkv4T/YDHwP7f77tkorpwUooI3K2JSyhOiWA7fvn+GZ+L9ZNIwdi9x3ZSHP0Zf+EoMYYoHq8hksgoyeZSA+Vaxp17qhCPFAgk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589339; c=relaxed/simple; bh=Jp3QDQo/A0dPoJRZkDfmNDwfgZeVfsEuuiJwVac4Il8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RhySt3j0oAFN2DlgwQHvadnzmLdwAzA53CHMWODkEtnpgWXDXuie2E+hkiU1NGHFcwGSvsT5yPP+qP25MRHT6agWgRNMS/y6+I40/cnp1jSK3/HorEG02EqKtPPTHQKjjTBN7QyuBoyCsnMF9i6w4f82R1dxuMMA8IRX1uKm1J4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hcFEKcoc; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hcFEKcoc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2L5STRbVcsUYFv2ZB+WgCbwYuC/Gpxza2z9Fbl/lWXE=; b=hcFEKcocHBMGGcSCm3TbMf6e8jQN23Ka0Ng8xRf3oCyVKldBqRclfyofyDjAx+AFWj/R36 WwaMX+S1BdJLjfSkfLWQOYigL0LFLs3C4acbioU6j2DF7tlArnxnJLW/HHwmCBnkOGCM10 A6OrVk13jwCWxj3UG5vUovLyK2rNa9U= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-668-_yT0-Y8WPEeMFlJIxkOalA-1; Tue, 12 May 2026 08:35:25 -0400 X-MC-Unique: _yT0-Y8WPEeMFlJIxkOalA-1 X-Mimecast-MFC-AGG-ID: _yT0-Y8WPEeMFlJIxkOalA_1778589323 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 357D61956055; Tue, 12 May 2026 12:35:23 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 437F3180034E; Tue, 12 May 2026 12:35:20 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 15/24] netfs: Fix read-gaps to remove netfs_folio from filled folio Date: Tue, 12 May 2026 13:33:52 +0100 Message-ID: <20260512123404.719402-16-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Fix netfs_read_gaps() to remove the netfs_folio record from the folio record before marking the folio uptodate if it successfully fills the gaps around the dirty data in a streaming write folio (dirty, but not uptodate). Found with: fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \ /xfstest.test/junk --replay-ops=3Djunk.fsxops using the following as junk.fsxops: truncate 0x0 0x138b1 0x8b15d * write 0x507ee 0x10df7 0x927c0 write 0x19993 0x10e04 0x927c0 * mapwrite 0x66214 0x1a253 0x927c0 copy_range 0xb704 0x89b9 0x24429 0x79380 write 0x2402b 0x144a2 0x90660 * mapwrite 0x204d5 0x140a0 0x927c0 * copy_range 0x1f72c 0x137d0 0x7a906 0x927c0 * read 0 0x9157c 0x9157c on cifs with the default cache option. It shows folio 0x24 misbehaving if the FMODE_READ check is commented out in netfs_perform_write(): if (//(file->f_mode & FMODE_READ) || netfs_is_cache_enabled(ctx)) { and no fscache. This was initially found with the generic/522 xfstest. Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") --- fs/netfs/buffered_read.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index ebd84a6cc3f0..51f844bfbdff 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -395,6 +395,7 @@ static int netfs_read_gaps(struct file *file, struct fo= lio *folio) { struct netfs_io_request *rreq; struct address_space *mapping =3D folio->mapping; + struct netfs_group *group =3D netfs_folio_group(folio); struct netfs_folio *finfo =3D netfs_folio_info(folio); struct netfs_inode *ctx =3D netfs_inode(mapping->host); struct folio *sink =3D NULL; @@ -461,6 +462,12 @@ static int netfs_read_gaps(struct file *file, struct f= olio *folio) =20 ret =3D netfs_wait_for_read(rreq); if (ret >=3D 0) { + if (group) + folio_change_private(folio, group); + else + folio_detach_private(folio); + kfree(finfo); + trace_netfs_folio(folio, netfs_folio_trace_filled_gaps); flush_dcache_folio(folio); folio_mark_uptodate(folio); } @@ -496,10 +503,8 @@ int netfs_read_folio(struct file *file, struct folio *= folio) struct netfs_inode *ctx =3D netfs_inode(mapping->host); int ret; =20 - if (folio_test_dirty(folio)) { - trace_netfs_folio(folio, netfs_folio_trace_read_gaps); + if (folio_test_dirty(folio)) return netfs_read_gaps(file, folio); - } =20 _enter("%lx", folio->index); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31C7F39732F for ; Tue, 12 May 2026 12:35:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589343; cv=none; b=kUSxT2g9Yjn1ch00KtspGEIVk8CrpTEDhlu0b987WaqXhL7F1ZTi/smSAhqKQMefaB4oG7sBye4QT2CBzalEKHXTfvGTTaEu3pJULUZ1AomrE6Q7hm1zWsLfQJSxa52TnhitS1sxBUxLNGw/EFjLzyESqCF4XqThwbXBX40pag8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589343; c=relaxed/simple; bh=rj2DI6YQj1WOpZcBEWurUKGYlDVO2t1wDhhzf23NyoE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DgYFpoDDAMgPeBF6aSK5EDQQJTdDI310Wm/pJB7nz27q50D1MJqk+6NdZho1090J9ajO9+yqqg8Ikf2DOf4qUP5Z3/oQZYe8r1LTmiVQ3XvQeJQBE5KAhOYeza0+xPyzZxP4B0hgIameUg6Q4RlVPxQWT9LCZ+dVaejiR/cKx54= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EA2dL8Vu; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EA2dL8Vu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589332; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XwYo3BIaKuiKcA6D145NDy8jxIJhkEUgnVmrfuQ1EDA=; b=EA2dL8Vu/3/toXoaQb2RgIvaYw7grEfNbfK1fmd26wtN+UbsJ0KECivTUwvrYuMAgWtcan LGoQ1afu65DhGGjOMreHeGj7v9OEoxz2/pQHSxbQaonTCkYcxnlKA1CK6e+M44c5SbjRKu yNl7yDYwfl/hm7eppROE6roMPPWafXE= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-uLVT08qzMQSQ9i0sTUn_Gg-1; Tue, 12 May 2026 08:35:29 -0400 X-MC-Unique: uLVT08qzMQSQ9i0sTUn_Gg-1 X-Mimecast-MFC-AGG-ID: uLVT08qzMQSQ9i0sTUn_Gg_1778589328 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DA9081800451; Tue, 12 May 2026 12:35:27 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E56781955D84; Tue, 12 May 2026 12:35:24 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 16/24] netfs: Fix write streaming disablement if fd open O_RDWR Date: Tue, 12 May 2026 13:33:53 +0100 Message-ID: <20260512123404.719402-17-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" In netfs_perform_write(), "write streaming" (the caching of dirty data in dirty but !uptodate folios) is performed to avoid the need to read data that is just going to get immediately overwritten. However, this is/will be disabled in three circumstances: if the fd is open O_RDWR, if fscache is in use (as we need to round out the blocks for DIO) or if content encryption is enabled (again for rounding out purposes). The idea behind disabling it if the fd is open O_RDWR is that we'd need to flush the write-streaming page before we could read the data, particularly through mmap. But netfs now fills in the gaps if ->read_folio() is called on the page, so that is unnecessary. Further, this doesn't actually work if a separate fd is open for reading. Fix this by removing the check for O_RDWR, thereby allowing streaming writes even when we might read. This caused a number of problems with the generic/522 xfstest, but those are now fixed. Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buf= fered write") Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_write.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index 991552724868..f79fb5996540 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -203,11 +203,11 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, } =20 /* Decide how we should modify a folio. We might be attempting - * to do write-streaming, in which case we don't want to a - * local RMW cycle if we can avoid it. If we're doing local - * caching or content crypto, we award that priority over - * avoiding RMW. If the file is open readably, then we also - * assume that we may want to read what we wrote. + * to do write-streaming, as we don't want to a local RMW cycle + * if we can avoid it. If we're doing local caching or content + * crypto, we award that priority over avoiding RMW. If the + * file is open readably, then we let ->read_folio() fill in + * the gaps. */ finfo =3D netfs_folio_info(folio); group =3D netfs_folio_group(folio); @@ -283,12 +283,9 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct= iov_iter *iter, =20 /* We don't want to do a streaming write on a file that loses * caching service temporarily because the backing store got - * culled and we don't really want to get a streaming write on - * a file that's open for reading as ->read_folio() then has to - * be able to flush it. + * culled. */ - if ((file->f_mode & FMODE_READ) || - netfs_is_cache_enabled(ctx)) { + if (netfs_is_cache_enabled(ctx)) { if (finfo) { netfs_stat(&netfs_n_wh_wstream_conflict); goto flush_content; From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4960C39E9A9 for ; Tue, 12 May 2026 12:35:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589353; cv=none; b=jpARwhNHtIBXA2rn4DyO3WLLrjRxpqYnlab2bAhDn6EzgvCsctJwA6ioippD/quLU3ROSA7gc4sl78Tls278aJFE8ictNEly80dYnGHcq5BB2w0+MHFvMKYlxnCRdzXVVG8xYtPcVnwDzzcnjn6/n2SLth+sUc9elfWU1SVpqBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589353; c=relaxed/simple; bh=j4Ucx2VqeNWxBAMjV95623QCwXdd60euBmv7vI7Wm08=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n+mZSTn5P0CaGxDPy6h5mmjxjvmFK/Px0FX01IjdIP+Y8gF/adAU4AY5jkN1K/+9d79n6vsqW1SxW/M0nxGOL7HaDKaDmEoj/pK+dQMXdTISOkiTG1rpeYiGuEo+/zk3GqYPhbvIbaMAt6sPzKqe90erMa2C7S59KOXDtae9ZrA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=b6ezFARF; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="b6ezFARF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R1leBP85I0eKKVmTA6LkTD8ef+GZAcgNSbybXNhpcqw=; b=b6ezFARF9edodgBn3zpYnhwM/8wdIElJ6b7xyzsIytllOmpLYMR6MR9BVcKth3wkWvuCnT 5+bPZilfBh7yFPTobhlKcRFma0y3OpdmiiuqfLBHOC5gqsSOYjqi8IoBv43q2PqH2UdqJU 1UWfNkZSLJenbde3sBsoN7VcHr+Obe8= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-679-I6jJyxAzPpW8mIEoci8M6A-1; Tue, 12 May 2026 08:35:34 -0400 X-MC-Unique: I6jJyxAzPpW8mIEoci8M6A-1 X-Mimecast-MFC-AGG-ID: I6jJyxAzPpW8mIEoci8M6A_1778589333 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F2B321800359; Tue, 12 May 2026 12:35:32 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 94AF91800352; Tue, 12 May 2026 12:35:29 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Steve French , Matthew Wilcox Subject: [PATCH v6 17/24] netfs: Fix early put of sink folio in netfs_read_gaps() Date: Tue, 12 May 2026 13:33:54 +0100 Message-ID: <20260512123404.719402-18-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Fix netfs_read_gaps() to release the sink page it uses after waiting for the request to complete. The way the sink page is used is that an ITER_BVEC-class iterator is created that has the gaps from the target folio at either end, but has the sink page tiled over the middle so that a single read op can fill in both gaps. The bug was found by KASAN detecting a UAF on the generic/075 xfstest in the cifsd kernel thread that handles reception of data from the TCP socket: BUG: KASAN: use-after-free in _copy_to_iter+0x48a/0xa20 Write of size 885 at addr ffff888107f92000 by task cifsd/1285 CPU: 2 UID: 0 PID: 1285 Comm: cifsd Not tainted 7.0.0 #6 PREEMPT(lazy) Call Trace: dump_stack_lvl+0x5d/0x80 print_report+0x17f/0x4f1 kasan_report+0x100/0x1e0 kasan_check_range+0x10f/0x1e0 __asan_memcpy+0x3c/0x60 _copy_to_iter+0x48a/0xa20 __skb_datagram_iter+0x2c9/0x430 skb_copy_datagram_iter+0x6e/0x160 tcp_recvmsg_locked+0xce0/0x1130 tcp_recvmsg+0xeb/0x300 inet_recvmsg+0xcf/0x3a0 sock_recvmsg+0xea/0x100 cifs_readv_from_socket+0x3a6/0x4d0 [cifs] cifs_read_iter_from_socket+0xdd/0x130 [cifs] cifs_readv_receive+0xaad/0xb10 [cifs] cifs_demultiplex_thread+0x1148/0x1740 [cifs] kthread+0x1cf/0x210 Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Steve French Signed-off-by: David Howells Reviewed-by: Paulo Alcantara (Red Hat) cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index 51f844bfbdff..e7ad511e494c 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -457,9 +457,6 @@ static int netfs_read_gaps(struct file *file, struct fo= lio *folio) =20 netfs_read_to_pagecache(rreq, NULL); =20 - if (sink) - folio_put(sink); - ret =3D netfs_wait_for_read(rreq); if (ret >=3D 0) { if (group) @@ -471,6 +468,9 @@ static int netfs_read_gaps(struct file *file, struct fo= lio *folio) flush_dcache_folio(folio); folio_mark_uptodate(folio); } + + if (sink) + folio_put(sink); folio_unlock(folio); netfs_put_request(rreq, netfs_rreq_trace_put_return); return ret < 0 ? ret : 0; From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6238839E9CC for ; Tue, 12 May 2026 12:35:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589354; cv=none; b=WN94lUAOlY6UflrAy8Bk3Ax4PgMzR/OodUKT1wHGshxnudcLK9tJeO5Q6Sj8NQbRS+9nY0BIpO2+vkmsTbvubFO/QlaylDKA6BbwK28NiZfOnj5xOctfWTyyAviitpeKj0ZFDg3nKyGK2zNV5oezXUeC48KFbG5nWQLnRxfwqb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589354; c=relaxed/simple; bh=CZ1hkvb+zk4+78/FeV2tQid3NFFqakn7Eb83+Pd3X70=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O0fvwG4sX5I320IRE3YGPQCWKVNg2tuXovwi8ZlizcTs6XK1aEFiPILH/Vg4gZ1nKCU9B/5x4LWSNTDkwkuq+jY4WLGJSup7f0Uj4UK01M9zy8cHB/9M24UCmOgIn//VbT4cZcJsTd4Dc3zA3BwXk8zR1DQVph6Ce1/inxzAjpA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=H86/vzly; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H86/vzly" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589342; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=znYeqs6YKQf3OpDNaC9p5cRmGyGWOUeOdSW228VSP5w=; b=H86/vzlysrM3XgjPdRbkTAAPgxDDcpBcYUnXx22dF7tNPTixId9vJ8etzl7Tvd0PoOgHKS hpCafJL5RUZ7JuAFYb3GjLdiOHEXtfhKqRh7Yyb1sWEwsj5q4290BiHCowYrYNJLFfPjWy HgYOTLq6gUQN1lcd1TW7JFsq+4WZyfo= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-346-_Vo223qnO82GLsZlmRqG_A-1; Tue, 12 May 2026 08:35:39 -0400 X-MC-Unique: _Vo223qnO82GLsZlmRqG_A-1 X-Mimecast-MFC-AGG-ID: _Vo223qnO82GLsZlmRqG_A_1778589337 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7414A1956095; Tue, 12 May 2026 12:35:37 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 78F1D1800347; Tue, 12 May 2026 12:35:34 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 18/24] netfs: Fix leak of request in netfs_write_begin() error handling Date: Tue, 12 May 2026 13:33:55 +0100 Message-ID: <20260512123404.719402-19-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Fix netfs_write_begin() to not leak our ref on the request in the event that we get an error from netfs_wait_for_read(). Fixes: 4090b31422a6 ("netfs: Add a function to consolidate beginning a read= ") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index e7ad511e494c..004d426c02b4 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -687,9 +687,9 @@ int netfs_write_begin(struct netfs_inode *ctx, =20 netfs_read_to_pagecache(rreq, NULL); ret =3D netfs_wait_for_read(rreq); + netfs_put_request(rreq, netfs_rreq_trace_put_return); if (ret < 0) goto error; - netfs_put_request(rreq, netfs_rreq_trace_put_return); =20 have_folio: ret =3D folio_wait_private_2_killable(folio); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 005D93A2E1D for ; Tue, 12 May 2026 12:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589361; cv=none; b=VFoZ5d6M8zWbrHKa0qFNAR7EnWWQV9dtGlecC6/RpQI6rmTvPQb1RwtQ6uhsC22UtE+qlr4g3Xd/sTb8cFsWkQ27oABT554xRNKKRf6nA2gVRle3mntiPi78tpx0BgDBSNEpcWr/In+PWJ7zHyrX6gkJOOGCAPkWvsjDV+FdhSw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589361; c=relaxed/simple; bh=X91mhNwrousf5qVSjAHQ8Z+8xr1p5P0AqpSIH2m8pGc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OdFqNKml082jZKqthurwLR/yDqwSgGNoNIIV+/makUyhtKm7s3Man7LzwbmX++4qYO4SbHR1tcMYP40d3ouY5WrV15ZpM8Y3c811czvJJos/0zA7hezZBJ9HrFC4L2dSLe9asu8z1MCJEkpsEyAtgCbaOHlE7KoP6Lfdvo2wjc8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XwWFllMz; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XwWFllMz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dvtk2pXGbPdhLSJfQiLCwfNISOASFwQp4yvG9BV3cnw=; b=XwWFllMz7DG64XySlWgkhbaRnH67B6vahjAH8tVmO0baYfVMmC5uXIp0gxzeLY+FXMBiJY L+gr3+HLCOM09rTa+OOraF9XLyBFLertC+QEplIN9mDJrDA0CI4TV1Uk/dk+PR5Lr1eDIU AQSJ23c+yj0UnF5JbzIE87v9bXenryY= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-160-lx7Rxd1DPfy8rahWGV5B7g-1; Tue, 12 May 2026 08:35:44 -0400 X-MC-Unique: lx7Rxd1DPfy8rahWGV5B7g-1 X-Mimecast-MFC-AGG-ID: lx7Rxd1DPfy8rahWGV5B7g_1778589342 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5EEA2195608C; Tue, 12 May 2026 12:35:42 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 31A8219560A2; Tue, 12 May 2026 12:35:38 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Viacheslav Dubeyko , Matthew Wilcox Subject: [PATCH v6 19/24] netfs: Fix potential UAF in netfs_unlock_abandoned_read_pages() Date: Tue, 12 May 2026 13:33:56 +0100 Message-ID: <20260512123404.719402-20-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" netfs_unlock_abandoned_read_pages(rreq) accesses the index of the folios it is wanting to unlock and compares that to rreq->no_unlock_folio so that it doesn't unlock a folio being read for netfs_perform_write() or netfs_write_begin(). However, given that netfs_unlock_abandoned_read_pages() is called _after_ NETFS_RREQ_IN_PROGRESS is cleared, the one folio that it's not allowed to dereference is the one specified by ->no_unlock_folio as ownership immediately reverts to the caller. Fix this by storing the folio pointer instead and using that rather than the index. Also fix netfs_unlock_read_folio() where the same applies. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Viacheslav Dubeyko cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 4 ++-- fs/netfs/read_collect.c | 2 +- fs/netfs/read_retry.c | 2 +- include/linux/netfs.h | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index 004d426c02b4..83d0b8153e96 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -670,7 +670,7 @@ int netfs_write_begin(struct netfs_inode *ctx, ret =3D PTR_ERR(rreq); goto error; } - rreq->no_unlock_folio =3D folio->index; + rreq->no_unlock_folio =3D folio; __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags); =20 ret =3D netfs_begin_cache_read(rreq, ctx); @@ -736,7 +736,7 @@ int netfs_prefetch_for_write(struct file *file, struct = folio *folio, goto error; } =20 - rreq->no_unlock_folio =3D folio->index; + rreq->no_unlock_folio =3D folio; __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags); ret =3D netfs_begin_cache_read(rreq, ctx); if (ret =3D=3D -ENOMEM || ret =3D=3D -EINTR || ret =3D=3D -ERESTARTSYS) diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index 3c9b847885c2..23660a590124 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -83,7 +83,7 @@ static void netfs_unlock_read_folio(struct netfs_io_reque= st *rreq, } =20 just_unlock: - if (folio->index =3D=3D rreq->no_unlock_folio && + if (folio =3D=3D rreq->no_unlock_folio && test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) { _debug("no unlock"); } else { diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c index e10eb5a07332..f59a70f3a086 100644 --- a/fs/netfs/read_retry.c +++ b/fs/netfs/read_retry.c @@ -292,7 +292,7 @@ void netfs_unlock_abandoned_read_pages(struct netfs_io_= request *rreq) struct folio *folio =3D folioq_folio(p, slot); =20 if (folio && !folioq_is_marked2(p, slot)) { - if (folio->index =3D=3D rreq->no_unlock_folio && + if (folio =3D=3D rreq->no_unlock_folio && test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) { _debug("no unlock"); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4fd1d796ad73..243c0f737938 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -252,7 +252,7 @@ struct netfs_io_request { unsigned long long collected_to; /* Point we've collected to */ unsigned long long cleaned_to; /* Position we've cleaned folios to */ unsigned long long abandon_to; /* Position to abandon folios to */ - pgoff_t no_unlock_folio; /* Don't unlock this folio after read */ + const struct folio *no_unlock_folio; /* Don't unlock this folio after rea= d */ unsigned int direct_bv_count; /* Number of elements in direct_bv[] */ unsigned int debug_id; unsigned int rsize; /* Maximum read size (0 for none) */ From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DE8B3905EB for ; Tue, 12 May 2026 12:35:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589365; cv=none; b=YRXBHj+N1if6UnKFkNxmSvcsd7OakCLozkaXZJfbp/Dm7gJbG7RzP6phdfHkBIusu3mXyQ7uwk+1do62oDY4yaz56uSAxtvhH33kFp0x/Pfqk9ge7ZKaiTawCLd8Dv8zW867yRrLYsFzzd0N7/7Oytu/Wwlda7eR/5hQ+kpjZ2A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589365; c=relaxed/simple; bh=coEsLwb8B7M0CzoNtVvIEKm/Jb2sGyKz9R7NakrtZTM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eWDwd/Ld57yG+fOhmMio4V2C++KkufowC0IK4xbJBpwnmtt3gZRrsYGxW/3Ub/iuR4BOYWj06xfzHsece0PIJZ170qDfcHFfGHfcBrQiBeain4lRj6lRw84E8NbvmCyIq0oBlRQGEOhT1jf8qGi9ukLiVHBL0SfrtQS56bz67q8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=K3Va0MSQ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K3Va0MSQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589351; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o5NDYI/CE4LhuyVlcxn+PhP3ASA3wkIY5IbeljCO0j4=; b=K3Va0MSQVersTqXIi4gnqafF+La4ahuVqN6tPPJKwqpLeV0vxEIZSo51l/RCAiD8SV2gxl pvrTnqWmjtaSdT5aXatcAQaYUsoV2dseLaC0hU4B5z8F+yoxf5BKGaSXLrn0Tc9ii+k70x YJecsarcFdLODPwNIvvb+IcwFe7N7AE= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-47-dHZMrYBXOI602yW-ZyENmg-1; Tue, 12 May 2026 08:35:48 -0400 X-MC-Unique: dHZMrYBXOI602yW-ZyENmg-1 X-Mimecast-MFC-AGG-ID: dHZMrYBXOI602yW-ZyENmg_1778589347 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2B48E1800619; Tue, 12 May 2026 12:35:47 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 387F71800352; Tue, 12 May 2026 12:35:43 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 20/24] netfs: Fix partial invalidation of streaming-write folio Date: Tue, 12 May 2026 13:33:57 +0100 Message-ID: <20260512123404.719402-21-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" In netfs_invalidate_folio(), if the region of a partial invalidation overlaps the front (but not all) of a dirty write cached in a streaming write page (dirty, but not uptodate, with the dirty region tracked by a netfs_folio struct), the function modifies the dirty region - but incorrectly as it moves the region forward by setting the start to the start, not the end, of the invalidation region. Fix this by setting finfo->dirty_offset to the end of the invalidation region (iend). Fixes: cce6bfa6ca0e ("netfs: Fix trimming of streaming-write folios in netf= s_inval_folio()") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/misc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c index 24b20e80e9a8..5d554512ed23 100644 --- a/fs/netfs/misc.c +++ b/fs/netfs/misc.c @@ -262,7 +262,7 @@ void netfs_invalidate_folio(struct folio *folio, size_t= offset, size_t length) goto erase_completely; /* Move the start of the data. */ finfo->dirty_len =3D fend - iend; - finfo->dirty_offset =3D offset; + finfo->dirty_offset =3D iend; trace_netfs_folio(folio, netfs_folio_trace_invalidate_front); return; } From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51B193905FF for ; Tue, 12 May 2026 12:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589368; cv=none; b=YSUz610W8Eub+XTdYDjedYBwCYMxXszboxdQvwWFd0v9YNGPghm9SDj3xEQrstefAVlje/BI1Q/Ln/fQad2FOZJ7V91CxhyUKdmUbGBH+qx1QhiTrQPPGooYG0qZeB9OI/iPpXT6VdCIXBGrHFr0Zj59qWbjKk15lySv+AgUuYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589368; c=relaxed/simple; bh=p7M15g/pqOgNeM7S9MzYcPJd9rsS/t6nLEIo139sIpA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DUUFQx8Nzje7qGwWt+10XjPPRXndSAeRvhHf0p0CwCAgBolCPp8KN+3OLkQz1HlAR9v5ENUyQxHxeYt9XUZQ8QYNkzHPN5p3cZy+RMtkB3vmnvUkyZiKeSOrHrm7U3di/WLaOnmFde73ZnLapOl/ZvG4NfUetcV9/e9ha5wsSdY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S7IoJuU6; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S7IoJuU6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rqEzWYiIRtiBd1dMSoVp7vbDbR95NMuV0PgzLr562oQ=; b=S7IoJuU6fCkfzYLK2I9Xzcw3xD5wnchoLIeNdA2x3OX00bgtCpNY9jyr5KneBSjd8wR7Hz ehxoy8vbnn4DnUP14yDhMjUUs4bZqI+IIU5xGOP4dMVyxFKpH2VqorfVMQ+GS8NTu+piko o3DJoObQMd7+rYJ8r8PXt9r2X0hQmII= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-537-M2RYgA4DOEmUxAzXq3C8aw-1; Tue, 12 May 2026 08:35:53 -0400 X-MC-Unique: M2RYgA4DOEmUxAzXq3C8aw-1 X-Mimecast-MFC-AGG-ID: M2RYgA4DOEmUxAzXq3C8aw_1778589351 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D1B4218002CF; Tue, 12 May 2026 12:35:51 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DFE1E30001BE; Tue, 12 May 2026 12:35:48 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 21/24] netfs: Fix folio->private handling in netfs_perform_write() Date: Tue, 12 May 2026 13:33:58 +0100 Message-ID: <20260512123404.719402-22-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" Under some circumstances, netfs_perform_write() doesn't correctly manipulate folio->private between NULL, NETFS_FOLIO_COPY_TO_CACHE, pointing to a group and pointing to a netfs_folio struct, leading to potential multiple attachments of private data with associated folio ref leaks and also leaks of netfs_folio structs or netfs_group refs. Fix this by consolidating the place at which a folio is marked uptodate in one place and having that look at what's attached to folio->private and decide how to clean it up and then set the new group. Also, the content shouldn't be flushed if group is NULL, even if a group is specified in the netfs_group parameter, as that would be the case for a new folio. A filesystem should always specify netfs_group or never specify netfs_group. The Sashiko auto-review tool noted that it was theoretically possible that the fpos >=3D ctx->zero_point section might leak if it modified a streaming write folio. This is unlikely, but with a network filesystem, third party changes can happen. It also pointed out that __netfs_set_group() would leak if called multiple times on the same folio from the "whole folio modify section". Fixes: 8f52de0077ba ("netfs: Reduce number of conditional branches in netfs= _perform_write()") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_write.c | 134 +++++++++++++++++++++-------------- include/trace/events/netfs.h | 1 + 2 files changed, 82 insertions(+), 53 deletions(-) diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index f79fb5996540..6bde3320bcec 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -12,24 +12,6 @@ #include #include "internal.h" =20 -static void __netfs_set_group(struct folio *folio, struct netfs_group *net= fs_group) -{ - if (netfs_group) - folio_attach_private(folio, netfs_get_group(netfs_group)); -} - -static void netfs_set_group(struct folio *folio, struct netfs_group *netfs= _group) -{ - void *priv =3D folio_get_private(folio); - - if (unlikely(priv !=3D netfs_group)) { - if (netfs_group && (!priv || priv =3D=3D NETFS_FOLIO_COPY_TO_CACHE)) - folio_attach_private(folio, netfs_get_group(netfs_group)); - else if (!netfs_group && priv =3D=3D NETFS_FOLIO_COPY_TO_CACHE) - folio_detach_private(folio); - } -} - /* * Grab a folio for writing and lock it. Attempt to allocate as large a f= olio * as possible to hold as much of the remaining length as possible in one = go. @@ -157,6 +139,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, size_t offset; /* Offset into pagecache folio */ size_t part; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ + void *priv; =20 offset =3D pos & (max_chunk - 1); part =3D min(max_chunk - offset, iov_iter_count(iter)); @@ -202,6 +185,25 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct= iov_iter *iter, goto error_folio_unlock; } =20 + finfo =3D netfs_folio_info(folio); + group =3D netfs_folio_group(folio); + + /* If the requested group differs from the group set on the + * page, then we need to flush out the folio if it has a group + * set (ie. is non-NULL). Note that COPY_TO_CACHE is a special + * case, being a netfs annotation rather than an actual group. + * + * The filesystem isn't permitted to mix writes with groups and + * writes without groups as the NULL group is used to indicate + * that no group is set. + */ + if (unlikely(group !=3D netfs_group) && + group !=3D NETFS_FOLIO_COPY_TO_CACHE && + group) { + WARN_ON_ONCE(!netfs_group); + goto flush_content; + } + /* Decide how we should modify a folio. We might be attempting * to do write-streaming, as we don't want to a local RMW cycle * if we can avoid it. If we're doing local caching or content @@ -209,22 +211,14 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, * file is open readably, then we let ->read_folio() fill in * the gaps. */ - finfo =3D netfs_folio_info(folio); - group =3D netfs_folio_group(folio); - - if (unlikely(group !=3D netfs_group) && - group !=3D NETFS_FOLIO_COPY_TO_CACHE) - goto flush_content; - if (folio_test_uptodate(folio)) { if (mapping_writably_mapped(mapping)) flush_dcache_folio(folio); copied =3D copy_folio_from_iter_atomic(folio, offset, part, iter); if (unlikely(copied =3D=3D 0)) goto copy_failed; - netfs_set_group(folio, netfs_group); trace =3D netfs_folio_is_uptodate; - goto copied; + goto copied_uptodate; } =20 /* If the page is above the zero-point then we assume that the @@ -237,24 +231,22 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, if (unlikely(copied =3D=3D 0)) goto copy_failed; folio_zero_segment(folio, offset + copied, flen); - __netfs_set_group(folio, netfs_group); - folio_mark_uptodate(folio); - trace =3D netfs_modify_and_clear; - goto copied; + if (finfo) + trace =3D netfs_modify_and_clear_rm_finfo; + else + trace =3D netfs_modify_and_clear; + goto mark_uptodate; } =20 /* See if we can write a whole folio in one go. */ if (!maybe_trouble && offset =3D=3D 0 && part >=3D flen) { copied =3D copy_folio_from_iter_atomic(folio, offset, part, iter); if (likely(copied =3D=3D part)) { - if (finfo) { + if (finfo) trace =3D netfs_whole_folio_modify_filled; - goto folio_now_filled; - } - __netfs_set_group(folio, netfs_group); - folio_mark_uptodate(folio); - trace =3D netfs_whole_folio_modify; - goto copied; + else + trace =3D netfs_whole_folio_modify; + goto mark_uptodate; } if (copied =3D=3D 0) goto copy_failed; @@ -272,7 +264,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, finfo->dirty_len +=3D finfo->dirty_offset; if (finfo->dirty_len =3D=3D flen) { trace =3D netfs_whole_folio_modify_filled_efault; - goto folio_now_filled; + goto mark_uptodate; } if (copied > finfo->dirty_len) finfo->dirty_len =3D copied; @@ -300,11 +292,11 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, copied =3D copy_folio_from_iter_atomic(folio, offset, part, iter); if (unlikely(copied =3D=3D 0)) goto copy_failed; - netfs_set_group(folio, netfs_group); trace =3D netfs_just_prefetch; - goto copied; + goto copied_uptodate; } =20 + /* Do a streaming write on a folio that has nothing in it yet. */ if (!finfo) { ret =3D -EIO; if (WARN_ON(folio_get_private(folio))) @@ -313,10 +305,8 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct= iov_iter *iter, if (unlikely(copied =3D=3D 0)) goto copy_failed; if (offset =3D=3D 0 && copied =3D=3D flen) { - __netfs_set_group(folio, netfs_group); - folio_mark_uptodate(folio); trace =3D netfs_streaming_filled_page; - goto copied; + goto mark_uptodate; } =20 finfo =3D kzalloc_obj(*finfo); @@ -345,7 +335,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct = iov_iter *iter, finfo->dirty_len +=3D copied; if (finfo->dirty_offset =3D=3D 0 && finfo->dirty_len =3D=3D flen) { trace =3D netfs_streaming_cont_filled_page; - goto folio_now_filled; + goto mark_uptodate; } trace =3D netfs_streaming_write_cont; goto copied; @@ -361,13 +351,36 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struc= t iov_iter *iter, goto out; continue; =20 - folio_now_filled: - if (finfo->netfs_group) - folio_change_private(folio, finfo->netfs_group); - else - folio_detach_private(folio); + /* Mark a folio as being up to data when we've filled it + * completely. If the folio has a group attached, then it must + * be the same group, otherwise we should have flushed it out + * above. We have to get rid of the netfs_folio struct if + * there was one. + */ + mark_uptodate: folio_mark_uptodate(folio); - kfree(finfo); + + copied_uptodate: + priv =3D folio_get_private(folio); + if (likely(priv =3D=3D netfs_group)) { + /* Already set correctly; no change required. */ + } else if (priv =3D=3D NETFS_FOLIO_COPY_TO_CACHE) { + if (!netfs_group) + folio_detach_private(folio); + else + folio_change_private(folio, netfs_get_group(netfs_group)); + } else if (!priv) { + folio_attach_private(folio, netfs_get_group(netfs_group)); + } else { + WARN_ON_ONCE(!finfo); + if (netfs_group) + /* finfo->netfs_group has a ref */ + folio_change_private(folio, netfs_group); + else + folio_detach_private(folio); + kfree(finfo); + } + copied: trace_netfs_folio(folio, trace); flush_dcache_folio(folio); @@ -530,6 +543,7 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, str= uct netfs_group *netfs_gr struct inode *inode =3D file_inode(file); struct netfs_inode *ictx =3D netfs_inode(inode); vm_fault_t ret =3D VM_FAULT_NOPAGE; + void *priv; int err; =20 _enter("%lx", folio->index); @@ -550,7 +564,9 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, str= uct netfs_group *netfs_gr } =20 group =3D netfs_folio_group(folio); - if (group !=3D netfs_group && group !=3D NETFS_FOLIO_COPY_TO_CACHE) { + if (group && + group !=3D netfs_group && + group !=3D NETFS_FOLIO_COPY_TO_CACHE) { folio_unlock(folio); err =3D filemap_fdatawrite_range(mapping, folio_pos(folio), @@ -572,7 +588,19 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, st= ruct netfs_group *netfs_gr trace_netfs_folio(folio, netfs_folio_trace_mkwrite_plus); else trace_netfs_folio(folio, netfs_folio_trace_mkwrite); - netfs_set_group(folio, netfs_group); + + priv =3D folio_get_private(folio); + if (priv !=3D netfs_group) { + if (!netfs_group && priv =3D=3D NETFS_FOLIO_COPY_TO_CACHE) + folio_detach_private(folio); + else if (netfs_group && priv =3D=3D NETFS_FOLIO_COPY_TO_CACHE) + folio_change_private(folio, netfs_get_group(netfs_group)); + else if (netfs_group && !priv) + folio_attach_private(folio, netfs_get_group(netfs_group)); + else + WARN_ON_ONCE(1); + } + file_update_time(file); set_bit(NETFS_ICTX_MODIFIED_ATTR, &ictx->flags); if (ictx->ops->post_modify) diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index aa9940ba307b..082cb03c6131 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -181,6 +181,7 @@ EM(netfs_whole_folio_modify_filled, "mod-whole-f+") \ EM(netfs_whole_folio_modify_filled_efault, "mod-whole-f+!") \ EM(netfs_modify_and_clear, "mod-n-clear") \ + EM(netfs_modify_and_clear_rm_finfo, "mod-n-clear+") \ EM(netfs_streaming_write, "mod-streamw") \ EM(netfs_streaming_write_cont, "mod-streamw+") \ EM(netfs_flush_content, "flush") \ From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00DA33AB5C0 for ; Tue, 12 May 2026 12:36:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589378; cv=none; b=eMhaIDQfUvbQgdGnfBfx/TaI3yYqjGn1OY1Wd+SKbrxCDVQKM/jMSi6NVx+gsRXLuzShAYp9H4rWnvtMexrqbfSPI5eqgjUAFw1FccjKVxsqX+jCjD3OJhMWEKa+/1uEPx1WdjLTBEti7NUuqPgg9qvknA8T83fModKJeBWG6Yw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589378; c=relaxed/simple; bh=1kNUWeWN+ykCg/OJUqROcQrPNt64HsFYpSs6E1pXHp8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ptN0h1+bdkOSO6C7LHykiyN9DMPwx+Q3RBTGg52CsRWyQgT3ZLwRnfgzqPqKGp17FwsYeoBkGbly4IeBymqxtQ1oe2JBTctBv4ghQrJTzarjI75y3IbLYHKfXqGVbvZDfTUZaEEetcStd27MTMmRhBaNolu75+ZKwyy4kt2fEIM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=JTclZaNo; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JTclZaNo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UKJIJgGRVCiWeJXzVXVwYgNOYUcH+tddDbLra7bwmdo=; b=JTclZaNoIr05A+bbhzsMGh0M+7KH7GX0dt2tBwYeW4YKHZCwYGXq9E+L3TLZ50iIC1o11s z4NKvEOCL4EyEuJD9hGyfe15CwdZKvGwhEDBZQa1kTzj5C4GMfZ5z1yG8+DmaU676AAEhs wV+3bh9+kAz+7EPjqjU+bxVaan3HP7Y= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-669-QZDO5lcAOWa_72c4pTMMRg-1; Tue, 12 May 2026 08:35:58 -0400 X-MC-Unique: QZDO5lcAOWa_72c4pTMMRg-1 X-Mimecast-MFC-AGG-ID: QZDO5lcAOWa_72c4pTMMRg_1778589356 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A200519560A2; Tue, 12 May 2026 12:35:56 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8F2AE1803A94; Tue, 12 May 2026 12:35:53 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: [PATCH v6 22/24] netfs: Fix netfs_read_folio() to wait on writeback Date: Tue, 12 May 2026 13:33:59 +0100 Message-ID: <20260512123404.719402-23-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Fix netfs_read_folio() to wait for an ongoing writeback to complete so that it can trust the dirty flag and whatever is attached to folio->private (folio->private may get cleaned up by the collector before it clears the writeback flag). Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40= redhat.com Signed-off-by: David Howells cc: Paulo Alcantara cc: Matthew Wilcox cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- fs/netfs/buffered_read.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c index 83d0b8153e96..76d0f6a29aba 100644 --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -503,6 +503,8 @@ int netfs_read_folio(struct file *file, struct folio *f= olio) struct netfs_inode *ctx =3D netfs_inode(mapping->host); int ret; =20 + folio_wait_writeback(folio); + if (folio_test_dirty(folio)) return netfs_read_gaps(file, folio); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDB013AB5DF for ; Tue, 12 May 2026 12:36:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589376; cv=none; b=ABRwrEOHsyDgvaMX4URx0CpdvDGmLHAVhFu/jRJJTmfiyU/zKJ639ms9sXDVHiMXzq4YUanN/IiCarbA1e+i4BvhiER62JHUjj53Vh9I3CZ9ntvwkILBjR150mbQH7cLVXoVwwvpszcNZE2UmBNG7ISe3H36QwBzTjcvAvYhuKk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589376; c=relaxed/simple; bh=JWtdSEkTLIP+RcKL4y5Cq/iLGRvfC6WoJtJdaBsUoHE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XzsWm8poHXpMK5D9ERLvWFmShs5/Jum3AgCXXhdOz1r6KrH+eMjKIG2y5v3NdS8tcypBkneLuTLnzR32BM82QAaQPmcQurHnFLMMZOceoDhS3qw7fzcizR2VBOq0wSCzxvPwXQPutqjY2RRRnv/UmXDOrwEOmlL18BC6fXyWxHw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=AsrrgNbv; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AsrrgNbv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y4OkQu6y5vCz1G5DtZoiQAaZ6tnWOMrRSNCWiJ+GppE=; b=AsrrgNbvz8eK/0/gHk+dUtVb0qwLL1mKVq16W6D+w1YSCe1IEx0hQYtwTA6dpAtETWz6AS +rofsepwNR+3f3xT5UQ9LDgUqCJdYd2zzgm4ELUNjaKwM4WCA6Y6CYyH+vhhm4Slx0qIdH YwD2HVccqS6vGVvXygILpnX5qloMnRs= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-658-6tvrOHUNOviDFZc-VJWZvw-1; Tue, 12 May 2026 08:36:02 -0400 X-MC-Unique: 6tvrOHUNOviDFZc-VJWZvw-1 X-Mimecast-MFC-AGG-ID: 6tvrOHUNOviDFZc-VJWZvw_1778589361 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 397891956096; Tue, 12 May 2026 12:36:01 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4849818004A3; Tue, 12 May 2026 12:35:58 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Marc Dionne Subject: [PATCH v6 23/24] netfs, afs: Fix write skipping in dir/link writepages Date: Tue, 12 May 2026 13:34:00 +0100 Message-ID: <20260512123404.719402-24-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Fix netfs_write_single() and afs_single_writepages() to better handle a write that would be skipped due to lock contention and WB_SYNC_NONE by returning 1 from netfs_write_single() if it skipped and making afs_single_writepages() skip also. If a skip occurs, the inode must be re-marked as the VFS may have cleared the mark. This is really only theoretical for directories in netfs_write_single() as the only path to that is through afs_single_writepages() that takes the ->validate_lock around it, thereby serialising it. Fixes: 6dd80936618c ("afs: Use netfslib for directories") Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org --- fs/afs/dir.c | 11 ++++++++++- fs/netfs/write_issue.c | 7 ++++++- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index aaaa55878ffd..d1542a1a50bf 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -2206,7 +2206,14 @@ int afs_single_writepages(struct address_space *mapp= ing, /* Need to lock to prevent the folio queue and folios from being thrown * away. */ - down_read(&dvnode->validate_lock); + if (!down_read_trylock(&dvnode->validate_lock)) { + if (wbc->sync_mode =3D=3D WB_SYNC_NONE) { + /* The VFS will have undirtied the inode. */ + netfs_single_mark_inode_dirty(&dvnode->netfs.inode); + return 0; + } + down_read(&dvnode->validate_lock); + } =20 if (is_dir ? test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) : @@ -2214,6 +2221,8 @@ int afs_single_writepages(struct address_space *mappi= ng, iov_iter_folio_queue(&iter, ITER_SOURCE, dvnode->directory, 0, 0, i_size_read(&dvnode->netfs.inode)); ret =3D netfs_writeback_single(mapping, wbc, &iter); + if (ret =3D=3D 1) + ret =3D 0; /* Skipped write due to lock conflict. */ } =20 up_read(&dvnode->validate_lock); diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c index 03961622996b..c03c7cc45e47 100644 --- a/fs/netfs/write_issue.c +++ b/fs/netfs/write_issue.c @@ -830,6 +830,9 @@ static int netfs_write_folio_single(struct netfs_io_req= uest *wreq, * * Write a monolithic, non-pagecache object back to the server and/or * the cache. + * + * Return: 0 if successful; 1 if skipped due to lock conflict and WB_SYNC_= NONE; + * or a negative error code. */ int netfs_writeback_single(struct address_space *mapping, struct writeback_control *wbc, @@ -846,8 +849,10 @@ int netfs_writeback_single(struct address_space *mappi= ng, =20 if (!mutex_trylock(&ictx->wb_lock)) { if (wbc->sync_mode =3D=3D WB_SYNC_NONE) { + /* The VFS will have undirtied the inode. */ + netfs_single_mark_inode_dirty(&ictx->inode); netfs_stat(&netfs_n_wb_lock_skip); - return 0; + return 1; } netfs_stat(&netfs_n_wb_lock_wait); mutex_lock(&ictx->wb_lock); From nobody Fri Jun 12 21:38:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 210473ADBB3 for ; Tue, 12 May 2026 12:36:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589384; cv=none; b=ov7NPU7ogEX8VauUCBNJcLglXx3cZ94/l89hVdO1qLXX3syKwZm7bfwNer8ymjDynJHK60u1sUFKRXeRI8/f8zehs0TxCbnv8b9TYr9wBnh7wq+f8mKdXqb6B1jRm2uC1KZDkEjskCVvnCq7oYYZYo7hMcN7TqKtm4hWAm+3BzE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778589384; c=relaxed/simple; bh=LPm03toHRfNAnMqMpavtXMqR/OI0dRMLzsY3s7Dwo9o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BZw6Ng694tu/CgnENurf6Wnf3zB+idHzQ2g2TarA55iRGSu4Rvk5gd+6/ixKxTkyBXkKzG8wgO43ne61qXtAkMQY1YLylx6q8LVjp7efNom4zUDOGG8YuMAWgsMBpi6szUSjT92O1tZ7dEW5GA3mbtGwpYgp5h/hPYSg1hvSaro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XLozhA5r; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XLozhA5r" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778589370; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/HMktJZpD1jhGP3LJ6hHNsj4p+y5y7krTOvAntwF2eY=; b=XLozhA5rwmmz/bldNdtMsd8C2yUVnRx/amB96yKzu21yCinqq4uL+ixNFm13EiVdxzJzO7 aPSCAJskgaw3sDPIyw5QQrF/BR7n4Rv/Bv+ZSUh0O8UPg/u4IzAY35JbKS3UxkTrOQHXg1 FGq8QYiEMkOtDqc8C1ppw7Stx3t+yJU= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-682-FlPuaaoJMiSy4Pi3todtyg-1; Tue, 12 May 2026 08:36:07 -0400 X-MC-Unique: FlPuaaoJMiSy4Pi3todtyg-1 X-Mimecast-MFC-AGG-ID: FlPuaaoJMiSy4Pi3todtyg_1778589366 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DF75D18005BE; Tue, 12 May 2026 12:36:05 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.44.48.83]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EAB761800352; Tue, 12 May 2026 12:36:02 +0000 (UTC) From: David Howells To: Christian Brauner Cc: David Howells , Paulo Alcantara , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Marc Dionne Subject: [PATCH v6 24/24] afs: Fix the locking used by afs_get_link() Date: Tue, 12 May 2026 13:34:01 +0100 Message-ID: <20260512123404.719402-25-dhowells@redhat.com> In-Reply-To: <20260512123404.719402-1-dhowells@redhat.com> References: <20260512123404.719402-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The afs filesystem in the kernel doesn't do locking correctly for symbolic links. There are a number of problems: (1) It doesn't do any locking around afs_read_single() to prevent races between multiple ->get_link() calls, thereby allowing the possibility of leaks. (2) It doesn't use RCU barriering when accessing the buffer pointers during RCU pathwalk. (3) It can race with another thread updating the contents of the symlink if a third party updated it on the server. Fix this by the following means: (0) Move symlink handling into its own file as this makes it more complicated. (1) Take the validate_lock around afs_read_single() to prevent races between multiple ->get_link() calls. (2) Keep a separate copy of the symlink contents with an rcu_head. This is always going to be a lot smaller than a page, so it can be kmalloc'd and save quite a bit of memory. It also needs a refcount for non-RCU pathwalk. (3) Split the symlink read and write-to-cache routines in afs from those for directories. (4) Discard the I/O buffer as soon as the write-to-cache completes as this is a full page (plus a folio_queue). (5) If there's no cache, discard the I/O buffer immediately after reading and copying if there is no cache. Fixes: eae9e78951bb ("afs: Use netfslib for symlinks, allowing them to be c= ached") Fixes: 6698c02d64b2 ("afs: Locally initialise the contents of a new symlink= on creation") Closes: https://sashiko.dev/#/patchset/20260326104544.509518-1-dhowells%40r= edhat.com Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org --- fs/afs/Makefile | 1 + fs/afs/dir.c | 68 +++++------ fs/afs/fsclient.c | 4 +- fs/afs/inode.c | 96 +-------------- fs/afs/internal.h | 34 ++++-- fs/afs/symlink.c | 278 ++++++++++++++++++++++++++++++++++++++++++++ fs/afs/validation.c | 14 ++- fs/afs/yfsclient.c | 4 +- 8 files changed, 357 insertions(+), 142 deletions(-) create mode 100644 fs/afs/symlink.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index b49b8fe682f3..0d8f1982d596 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -30,6 +30,7 @@ kafs-y :=3D \ server.o \ server_list.o \ super.o \ + symlink.o \ validation.o \ vlclient.o \ vl_alias.o \ diff --git a/fs/afs/dir.c b/fs/afs/dir.c index d1542a1a50bf..498b99ccdf0e 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -44,6 +44,8 @@ static int afs_symlink(struct mnt_idmap *idmap, struct in= ode *dir, static int afs_rename(struct mnt_idmap *idmap, struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags); +static int afs_dir_writepages(struct address_space *mapping, + struct writeback_control *wbc); =20 const struct file_operations afs_dir_file_operations =3D { .open =3D afs_dir_open, @@ -68,7 +70,7 @@ const struct inode_operations afs_dir_inode_operations = =3D { }; =20 const struct address_space_operations afs_dir_aops =3D { - .writepages =3D afs_single_writepages, + .writepages =3D afs_dir_writepages, }; =20 const struct dentry_operations afs_fs_dentry_operations =3D { @@ -233,22 +235,13 @@ static ssize_t afs_do_read_single(struct afs_vnode *d= vnode, struct file *file) struct iov_iter iter; ssize_t ret; loff_t i_size; - bool is_dir =3D (S_ISDIR(dvnode->netfs.inode.i_mode) && - !test_bit(AFS_VNODE_MOUNTPOINT, &dvnode->flags)); =20 i_size =3D i_size_read(&dvnode->netfs.inode); - if (is_dir) { - if (i_size < AFS_DIR_BLOCK_SIZE) - return afs_bad(dvnode, afs_file_error_dir_small); - if (i_size > AFS_DIR_BLOCK_SIZE * 1024) { - trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); - return -EFBIG; - } - } else { - if (i_size > AFSPATHMAX) { - trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); - return -EFBIG; - } + if (i_size < AFS_DIR_BLOCK_SIZE) + return afs_bad(dvnode, afs_file_error_dir_small); + if (i_size > AFS_DIR_BLOCK_SIZE * 1024) { + trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); + return -EFBIG; } =20 /* Expand the storage. TODO: Shrink the storage too. */ @@ -277,24 +270,18 @@ static ssize_t afs_do_read_single(struct afs_vnode *d= vnode, struct file *file) * buffer. */ ret =3D -ESTALE; - } else if (is_dir) { + } else { int ret2 =3D afs_dir_check(dvnode); =20 if (ret2 < 0) ret =3D ret2; - } else if (i_size < folioq_folio_size(dvnode->directory, 0)) { - /* NUL-terminate a symlink. */ - char *symlink =3D kmap_local_folio(folioq_folio(dvnode->directory, 0), = 0); - - symlink[i_size] =3D 0; - kunmap_local(symlink); } } =20 return ret; } =20 -ssize_t afs_read_single(struct afs_vnode *dvnode, struct file *file) +static ssize_t afs_read_single(struct afs_vnode *dvnode, struct file *file) { ssize_t ret; =20 @@ -1763,13 +1750,20 @@ static int afs_link(struct dentry *from, struct ino= de *dir, return ret; } =20 +static void afs_symlink_put(struct afs_operation *op) +{ + kfree(op->create.symlink); + op->create.symlink =3D NULL; + afs_create_put(op); +} + static const struct afs_operation_ops afs_symlink_operation =3D { .issue_afs_rpc =3D afs_fs_symlink, .issue_yfs_rpc =3D yfs_fs_symlink, .success =3D afs_create_success, .aborted =3D afs_check_for_remote_deletion, .edit_dir =3D afs_create_edit_dir, - .put =3D afs_create_put, + .put =3D afs_symlink_put, }; =20 /* @@ -1779,7 +1773,9 @@ static int afs_symlink(struct mnt_idmap *idmap, struc= t inode *dir, struct dentry *dentry, const char *content) { struct afs_operation *op; + struct afs_symlink *symlink; struct afs_vnode *dvnode =3D AFS_FS_I(dir); + size_t clen =3D strlen(content); int ret; =20 _enter("{%llx:%llu},{%pd},%s", @@ -1791,12 +1787,20 @@ static int afs_symlink(struct mnt_idmap *idmap, str= uct inode *dir, goto error; =20 ret =3D -EINVAL; - if (strlen(content) >=3D AFSPATHMAX) + if (clen >=3D AFSPATHMAX) + goto error; + + ret =3D -ENOMEM; + symlink =3D kmalloc_flex(struct afs_symlink, content, clen + 1, GFP_KERNE= L); + if (!symlink) goto error; + refcount_set(&symlink->ref, 1); + memcpy(symlink->content, content, clen + 1); =20 op =3D afs_alloc_operation(NULL, dvnode->volume); if (IS_ERR(op)) { ret =3D PTR_ERR(op); + kfree(symlink); goto error; } =20 @@ -1808,7 +1812,7 @@ static int afs_symlink(struct mnt_idmap *idmap, struc= t inode *dir, op->dentry =3D dentry; op->ops =3D &afs_symlink_operation; op->create.reason =3D afs_edit_dir_for_symlink; - op->create.symlink =3D content; + op->create.symlink =3D symlink; op->mtime =3D current_time(dir); ret =3D afs_do_sync_operation(op); afs_dir_unuse_cookie(dvnode, ret); @@ -2192,15 +2196,13 @@ static int afs_rename(struct mnt_idmap *idmap, stru= ct inode *old_dir, } =20 /* - * Write the file contents to the cache as a single blob. + * Write the directory contents to the cache as a single blob. */ -int afs_single_writepages(struct address_space *mapping, - struct writeback_control *wbc) +static int afs_dir_writepages(struct address_space *mapping, + struct writeback_control *wbc) { struct afs_vnode *dvnode =3D AFS_FS_I(mapping->host); struct iov_iter iter; - bool is_dir =3D (S_ISDIR(dvnode->netfs.inode.i_mode) && - !test_bit(AFS_VNODE_MOUNTPOINT, &dvnode->flags)); int ret =3D 0; =20 /* Need to lock to prevent the folio queue and folios from being thrown @@ -2215,9 +2217,7 @@ int afs_single_writepages(struct address_space *mappi= ng, down_read(&dvnode->validate_lock); } =20 - if (is_dir ? - test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags) : - atomic64_read(&dvnode->cb_expires_at) !=3D AFS_NO_CB_PROMISE) { + if (test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) { iov_iter_folio_queue(&iter, ITER_SOURCE, dvnode->directory, 0, 0, i_size_read(&dvnode->netfs.inode)); ret =3D netfs_writeback_single(mapping, wbc, &iter); diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 95494d5f2b8a..a2ffd60889f8 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -886,7 +886,7 @@ void afs_fs_symlink(struct afs_operation *op) namesz =3D name->len; padsz =3D (4 - (namesz & 3)) & 3; =20 - c_namesz =3D strlen(op->create.symlink); + c_namesz =3D strlen(op->create.symlink->content); c_padsz =3D (4 - (c_namesz & 3)) & 3; =20 reqsz =3D (6 * 4) + namesz + padsz + c_namesz + c_padsz + (6 * 4); @@ -910,7 +910,7 @@ void afs_fs_symlink(struct afs_operation *op) bp =3D (void *) bp + padsz; } *bp++ =3D htonl(c_namesz); - memcpy(bp, op->create.symlink, c_namesz); + memcpy(bp, op->create.symlink->content, c_namesz); bp =3D (void *) bp + c_namesz; if (c_padsz > 0) { memset(bp, 0, c_padsz); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 19fe2e392885..3f48458694ba 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -25,96 +25,6 @@ #include "internal.h" #include "afs_fs.h" =20 -void afs_init_new_symlink(struct afs_vnode *vnode, struct afs_operation *o= p) -{ - size_t size =3D strlen(op->create.symlink) + 1; - size_t dsize =3D 0; - char *p; - - if (netfs_alloc_folioq_buffer(NULL, &vnode->directory, &dsize, size, - mapping_gfp_mask(vnode->netfs.inode.i_mapping)) < 0) - return; - - vnode->directory_size =3D dsize; - p =3D kmap_local_folio(folioq_folio(vnode->directory, 0), 0); - memcpy(p, op->create.symlink, size); - kunmap_local(p); - set_bit(AFS_VNODE_DIR_READ, &vnode->flags); - netfs_single_mark_inode_dirty(&vnode->netfs.inode); -} - -static void afs_put_link(void *arg) -{ - struct folio *folio =3D virt_to_folio(arg); - - kunmap_local(arg); - folio_put(folio); -} - -const char *afs_get_link(struct dentry *dentry, struct inode *inode, - struct delayed_call *callback) -{ - struct afs_vnode *vnode =3D AFS_FS_I(inode); - struct folio *folio; - char *content; - ssize_t ret; - - if (!dentry) { - /* RCU pathwalk. */ - if (!test_bit(AFS_VNODE_DIR_READ, &vnode->flags) || !afs_check_validity(= vnode)) - return ERR_PTR(-ECHILD); - goto good; - } - - if (test_bit(AFS_VNODE_DIR_READ, &vnode->flags)) - goto fetch; - - ret =3D afs_validate(vnode, NULL); - if (ret < 0) - return ERR_PTR(ret); - - if (!test_and_clear_bit(AFS_VNODE_ZAP_DATA, &vnode->flags) && - test_bit(AFS_VNODE_DIR_READ, &vnode->flags)) - goto good; - -fetch: - ret =3D afs_read_single(vnode, NULL); - if (ret < 0) - return ERR_PTR(ret); - set_bit(AFS_VNODE_DIR_READ, &vnode->flags); - -good: - folio =3D folioq_folio(vnode->directory, 0); - folio_get(folio); - content =3D kmap_local_folio(folio, 0); - set_delayed_call(callback, afs_put_link, content); - return content; -} - -int afs_readlink(struct dentry *dentry, char __user *buffer, int buflen) -{ - DEFINE_DELAYED_CALL(done); - const char *content; - int len; - - content =3D afs_get_link(dentry, d_inode(dentry), &done); - if (IS_ERR(content)) { - do_delayed_call(&done); - return PTR_ERR(content); - } - - len =3D umin(strlen(content), buflen); - if (copy_to_user(buffer, content, len)) - len =3D -EFAULT; - do_delayed_call(&done); - return len; -} - -static const struct inode_operations afs_symlink_inode_operations =3D { - .get_link =3D afs_get_link, - .readlink =3D afs_readlink, -}; - static noinline void dump_vnode(struct afs_vnode *vnode, struct afs_vnode = *parent_vnode) { static unsigned long once_only; @@ -214,7 +124,7 @@ static int afs_inode_init_from_status(struct afs_operat= ion *op, inode->i_mode =3D S_IFLNK | status->mode; inode->i_op =3D &afs_symlink_inode_operations; } - inode->i_mapping->a_ops =3D &afs_dir_aops; + inode->i_mapping->a_ops =3D &afs_symlink_aops; inode_nohighmem(inode); mapping_set_release_always(inode->i_mapping); break; @@ -769,12 +679,14 @@ void afs_evict_inode(struct inode *inode) .range_end =3D LLONG_MAX, }; =20 - afs_single_writepages(inode->i_mapping, &wbc); + inode->i_mapping->a_ops->writepages(inode->i_mapping, &wbc); } =20 netfs_wait_for_outstanding_io(inode); truncate_inode_pages_final(&inode->i_data); netfs_free_folioq_buffer(vnode->directory); + if (vnode->symlink) + afs_evict_symlink(vnode); =20 afs_set_cache_aux(vnode, &aux); netfs_clear_inode_writeback(inode, &aux); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 816dc848ea71..0b72a8566299 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -710,6 +710,7 @@ struct afs_vnode { #define AFS_VNODE_DIR_READ 11 /* Set if we've read a dir's contents */ =20 struct folio_queue *directory; /* Directory contents */ + struct afs_symlink __rcu *symlink; /* Symlink content */ struct list_head wb_keys; /* List of keys available for writeback */ struct list_head pending_locks; /* locks waiting to be granted */ struct list_head granted_locks; /* locks granted on this file */ @@ -776,6 +777,15 @@ struct afs_permits { struct afs_permit permits[] __counted_by(nr_permits); /* List of permits = sorted by key pointer */ }; =20 +/* + * Copy of symlink content for normal use. + */ +struct afs_symlink { + struct rcu_head rcu; + refcount_t ref; + char content[]; +}; + /* * Error prioritisation and accumulation. */ @@ -887,7 +897,7 @@ struct afs_operation { struct { int reason; /* enum afs_edit_dir_reason */ mode_t mode; - const char *symlink; + struct afs_symlink *symlink; } create; struct { bool need_rehash; @@ -1098,13 +1108,10 @@ extern const struct inode_operations afs_dir_inode_= operations; extern const struct address_space_operations afs_dir_aops; extern const struct dentry_operations afs_fs_dentry_operations; =20 -ssize_t afs_read_single(struct afs_vnode *dvnode, struct file *file); ssize_t afs_read_dir(struct afs_vnode *dvnode, struct file *file) __acquires(&dvnode->validate_lock); extern void afs_d_release(struct dentry *); extern void afs_check_for_remote_deletion(struct afs_operation *); -int afs_single_writepages(struct address_space *mapping, - struct writeback_control *wbc); =20 /* * dir_edit.c @@ -1247,10 +1254,6 @@ extern void afs_fs_probe_cleanup(struct afs_net *); */ extern const struct afs_operation_ops afs_fetch_status_operation; =20 -void afs_init_new_symlink(struct afs_vnode *vnode, struct afs_operation *o= p); -const char *afs_get_link(struct dentry *dentry, struct inode *inode, - struct delayed_call *callback); -int afs_readlink(struct dentry *dentry, char __user *buffer, int buflen); extern void afs_vnode_commit_status(struct afs_operation *, struct afs_vno= de_param *); extern int afs_fetch_status(struct afs_vnode *, struct key *, bool, afs_ac= cess_t *); extern int afs_ilookup5_test_by_fid(struct inode *, void *); @@ -1600,6 +1603,21 @@ void afs_detach_volume_from_servers(struct afs_volum= e *volume, struct afs_server extern int __init afs_fs_init(void); extern void afs_fs_exit(void); =20 +/* + * symlink.c + */ +extern const struct inode_operations afs_symlink_inode_operations; +extern const struct address_space_operations afs_symlink_aops; + +void afs_invalidate_symlink(struct afs_vnode *vnode); +void afs_evict_symlink(struct afs_vnode *vnode); +void afs_init_new_symlink(struct afs_vnode *vnode, struct afs_operation *o= p); +const char *afs_get_link(struct dentry *dentry, struct inode *inode, + struct delayed_call *callback); +int afs_readlink(struct dentry *dentry, char __user *buffer, int buflen); +int afs_symlink_writepages(struct address_space *mapping, + struct writeback_control *wbc); + /* * validation.c */ diff --git a/fs/afs/symlink.c b/fs/afs/symlink.c new file mode 100644 index 000000000000..ed5868369f37 --- /dev/null +++ b/fs/afs/symlink.c @@ -0,0 +1,278 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* AFS filesystem symbolic link handling + * + * Copyright (C) 2026 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include "internal.h" + +static void afs_put_symlink(struct afs_symlink *symlink) +{ + if (refcount_dec_and_test(&symlink->ref)) + kfree_rcu(symlink, rcu); +} + +static void afs_replace_symlink(struct afs_vnode *vnode, struct afs_symlin= k *symlink) +{ + struct afs_symlink *old; + + old =3D rcu_replace_pointer(vnode->symlink, symlink, + lockdep_is_held(&vnode->validate_lock)); + if (old) + afs_put_symlink(old); +} + +/* + * In the event that a third-party update of a symlink occurs, dispose of = the + * copy of the old contents. Called under ->validate_lock. + */ +void afs_invalidate_symlink(struct afs_vnode *vnode) +{ + afs_replace_symlink(vnode, NULL); +} + +/* + * Dispose of a symlink copy during inode deletion. + */ +void afs_evict_symlink(struct afs_vnode *vnode) +{ + struct afs_symlink *old; + + old =3D rcu_replace_pointer(vnode->symlink, NULL, true); + if (old) + afs_put_symlink(old); + +} + +/* + * Set up a locally created symlink inode for immediate write to the cache. + */ +void afs_init_new_symlink(struct afs_vnode *vnode, struct afs_operation *o= p) +{ + struct afs_symlink *symlink =3D op->create.symlink; + size_t dsize =3D 0; + size_t size =3D strlen(symlink->content) + 1; + char *p; + + rcu_assign_pointer(vnode->symlink, symlink); + op->create.symlink =3D NULL; + + if (!fscache_cookie_enabled(netfs_i_cookie(&vnode->netfs))) + return; + + if (netfs_alloc_folioq_buffer(NULL, &vnode->directory, &dsize, size, + mapping_gfp_mask(vnode->netfs.inode.i_mapping)) < 0) + return; + + vnode->directory_size =3D dsize; + p =3D kmap_local_folio(folioq_folio(vnode->directory, 0), 0); + memcpy(p, symlink->content, size); + kunmap_local(p); + netfs_single_mark_inode_dirty(&vnode->netfs.inode); +} + +/* + * Read a symlink in a single download. + */ +static ssize_t afs_do_read_symlink(struct afs_vnode *vnode) +{ + struct afs_symlink *symlink; + struct iov_iter iter; + ssize_t ret; + loff_t i_size; + + i_size =3D i_size_read(&vnode->netfs.inode); + if (i_size > PAGE_SIZE - 1) { + trace_afs_file_error(vnode, -EFBIG, afs_file_error_dir_big); + return -EFBIG; + } + + if (!vnode->directory) { + size_t cur_size =3D 0; + + ret =3D netfs_alloc_folioq_buffer(NULL, + &vnode->directory, &cur_size, PAGE_SIZE, + mapping_gfp_mask(vnode->netfs.inode.i_mapping)); + vnode->directory_size =3D PAGE_SIZE - 1; + if (ret < 0) + return ret; + } + + iov_iter_folio_queue(&iter, ITER_DEST, vnode->directory, 0, 0, PAGE_SIZE); + + /* AFS requires us to perform the read of a symlink as a single unit to + * avoid issues with the content being changed between reads. + */ + ret =3D netfs_read_single(&vnode->netfs.inode, NULL, &iter); + if (ret >=3D 0) { + i_size =3D ret; + if (i_size > PAGE_SIZE - 1) { + trace_afs_file_error(vnode, -EFBIG, afs_file_error_dir_big); + return -EFBIG; + } + vnode->directory_size =3D i_size; + + /* Copy the symlink. */ + symlink =3D kmalloc_flex(struct afs_symlink, content, i_size + 1, + GFP_KERNEL); + if (!symlink) + return -ENOMEM; + + refcount_set(&symlink->ref, 1); + symlink->content[i_size] =3D 0; + + const char *s =3D kmap_local_folio(folioq_folio(vnode->directory, 0), 0); + + memcpy(symlink->content, s, i_size); + kunmap_local(s); + + afs_replace_symlink(vnode, symlink); + } + + if (!fscache_cookie_enabled(netfs_i_cookie(&vnode->netfs))) { + netfs_free_folioq_buffer(vnode->directory); + vnode->directory =3D NULL; + vnode->directory_size =3D 0; + } + + return ret; +} + +static ssize_t afs_read_symlink(struct afs_vnode *vnode) +{ + ssize_t ret; + + fscache_use_cookie(afs_vnode_cache(vnode), false); + ret =3D afs_do_read_symlink(vnode); + fscache_unuse_cookie(afs_vnode_cache(vnode), NULL, NULL); + return ret; +} + +static void afs_put_link(void *arg) +{ + afs_put_symlink(arg); +} + +const char *afs_get_link(struct dentry *dentry, struct inode *inode, + struct delayed_call *callback) +{ + struct afs_symlink *symlink; + struct afs_vnode *vnode =3D AFS_FS_I(inode); + ssize_t ret; + + if (!dentry) { + /* RCU pathwalk. */ + symlink =3D rcu_dereference(vnode->symlink); + if (!symlink || !afs_check_validity(vnode)) + return ERR_PTR(-ECHILD); + set_delayed_call(callback, NULL, NULL); + return symlink->content; + } + + if (vnode->symlink) { + ret =3D afs_validate(vnode, NULL); + if (ret < 0) + return ERR_PTR(ret); + + down_read(&vnode->validate_lock); + if (vnode->symlink) + goto good; + up_read(&vnode->validate_lock); + } + + if (down_write_killable(&vnode->validate_lock) < 0) + return ERR_PTR(-ERESTARTSYS); + if (!vnode->symlink) { + ret =3D afs_read_symlink(vnode); + if (ret < 0) { + up_write(&vnode->validate_lock); + return ERR_PTR(ret); + } + } + + downgrade_write(&vnode->validate_lock); +=09 +good: + symlink =3D rcu_dereference_protected(vnode->symlink, + lockdep_is_held(&vnode->validate_lock)); + refcount_inc(&symlink->ref); + up_read(&vnode->validate_lock); + + set_delayed_call(callback, afs_put_link, symlink); + return symlink->content; +} + +int afs_readlink(struct dentry *dentry, char __user *buffer, int buflen) +{ + DEFINE_DELAYED_CALL(done); + const char *content; + int len; + + content =3D afs_get_link(dentry, d_inode(dentry), &done); + if (IS_ERR(content)) { + do_delayed_call(&done); + return PTR_ERR(content); + } + + len =3D umin(strlen(content), buflen); + if (copy_to_user(buffer, content, len)) + len =3D -EFAULT; + do_delayed_call(&done); + return len; +} + +/* + * Write the symlink contents to the cache as a single blob. We then throw + * away the page we used to receive it. + */ +int afs_symlink_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct afs_vnode *vnode =3D AFS_FS_I(mapping->host); + struct iov_iter iter; + int ret =3D 0; + + if (!down_read_trylock(&vnode->validate_lock)) { + if (wbc->sync_mode =3D=3D WB_SYNC_NONE) { + /* The VFS will have undirtied the inode. */ + netfs_single_mark_inode_dirty(&vnode->netfs.inode); + return 0; + } + down_read(&vnode->validate_lock); + } + + if (vnode->directory && + atomic64_read(&vnode->cb_expires_at) !=3D AFS_NO_CB_PROMISE) { + iov_iter_folio_queue(&iter, ITER_SOURCE, vnode->directory, 0, 0, + i_size_read(&vnode->netfs.inode)); + ret =3D netfs_writeback_single(mapping, wbc, &iter); + } + + if (ret =3D=3D 0) { + mutex_lock(&vnode->netfs.wb_lock); + netfs_free_folioq_buffer(vnode->directory); + vnode->directory =3D NULL; + vnode->directory_size =3D 0; + mutex_unlock(&vnode->netfs.wb_lock); + } else if (ret =3D=3D 1) { + ret =3D 0; /* Skipped write due to lock conflict. */ + } + + up_read(&vnode->validate_lock); + return ret; +} + +const struct inode_operations afs_symlink_inode_operations =3D { + .get_link =3D afs_get_link, + .readlink =3D afs_readlink, +}; + +const struct address_space_operations afs_symlink_aops =3D { + .writepages =3D afs_symlink_writepages, +}; diff --git a/fs/afs/validation.c b/fs/afs/validation.c index 0ba8336c9025..e997563af658 100644 --- a/fs/afs/validation.c +++ b/fs/afs/validation.c @@ -465,11 +465,17 @@ int afs_validate(struct afs_vnode *vnode, struct key = *key) vnode->cb_ro_snapshot =3D cb_ro_snapshot; vnode->cb_scrub =3D cb_scrub; =20 - /* if the vnode's data version number changed then its contents are - * different */ + /* If the vnode's data version number changed then its contents are + * different. Note that afs_apply_status() doesn't set ZAP_DATA on + * directories. + */ zap |=3D test_and_clear_bit(AFS_VNODE_ZAP_DATA, &vnode->flags); - if (zap) - afs_zap_data(vnode); + if (zap) { + if (S_ISREG(vnode->netfs.inode.i_mode)) + afs_zap_data(vnode); + else if (S_ISLNK(vnode->netfs.inode.i_mode)) + afs_invalidate_symlink(vnode); + } up_write(&vnode->validate_lock); _leave(" =3D 0"); return 0; diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 24fb562ebd33..d941179730a9 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -960,7 +960,7 @@ void yfs_fs_symlink(struct afs_operation *op) =20 _enter(""); =20 - contents_sz =3D strlen(op->create.symlink); + contents_sz =3D strlen(op->create.symlink->content); call =3D afs_alloc_flat_call(op->net, &yfs_RXYFSSymlink, sizeof(__be32) + sizeof(struct yfs_xdr_RPCFlags) + @@ -981,7 +981,7 @@ void yfs_fs_symlink(struct afs_operation *op) bp =3D xdr_encode_u32(bp, 0); /* RPC flags */ bp =3D xdr_encode_YFSFid(bp, &dvp->fid); bp =3D xdr_encode_name(bp, name); - bp =3D xdr_encode_string(bp, op->create.symlink, contents_sz); + bp =3D xdr_encode_string(bp, op->create.symlink->content, contents_sz); bp =3D xdr_encode_YFSStoreStatus(bp, &mode, &op->mtime); yfs_check_req(call, bp);