[PATCH] iomap: move prefaulting out of hot write path

alexjlzheng@gmail.com posted 1 patch 2 months, 1 week ago
There is a newer version of this series
fs/iomap/buffered-io.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
[PATCH] iomap: move prefaulting out of hot write path
Posted by alexjlzheng@gmail.com 2 months, 1 week ago
From: Jinliang Zheng <alexjlzheng@tencent.com>

Prefaulting the write source buffer incurs an extra userspace access
in the common fast path. Make iomap_write_iter() consistent with
generic_perform_write(): only touch userspace an extra time when
copy_folio_from_iter_atomic() has failed to make progress.

This patch is inspired by commit 665575cff098 ("filemap: move
prefaulting out of hot write path").

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
 fs/iomap/buffered-io.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8b847a1e27f1..6e6573fce78a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -972,21 +972,6 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
 		if (bytes > iomap_length(iter))
 			bytes = iomap_length(iter);
 
-		/*
-		 * Bring in the user page that we'll copy from _first_.
-		 * Otherwise there's a nasty deadlock on copying from the
-		 * same page as we're writing to, without it being marked
-		 * up-to-date.
-		 *
-		 * For async buffered writes the assumption is that the user
-		 * page has already been faulted in. This can be optimized by
-		 * faulting the user page.
-		 */
-		if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
-			status = -EFAULT;
-			break;
-		}
-
 		status = iomap_write_begin(iter, write_ops, &folio, &offset,
 				&bytes);
 		if (unlikely(status)) {
@@ -1001,6 +986,12 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_folio(folio);
 
+		/*
+		 * Faults here on mmap()s can recurse into arbitrary
+		 * filesystem code. Lots of locks are held that can
+		 * deadlock. Use an atomic copy to avoid deadlocking
+		 * in page fault handling.
+		 */
 		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
 		written = iomap_write_end(iter, bytes, copied, folio) ?
 			  copied : 0;
@@ -1039,6 +1030,16 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
 				bytes = copied;
 				goto retry;
 			}
+
+			/*
+			 * 'folio' is now unlocked and faults on it can be
+			 * handled. Ensure forward progress by trying to
+			 * fault it in now.
+			 */
+			if (fault_in_iov_iter_readable(i, bytes) == bytes) {
+				status = -EFAULT;
+				break;
+			}
 		} else {
 			total_written += written;
 			iomap_iter_advance(iter, &written);
-- 
2.49.0
Re: [PATCH] iomap: move prefaulting out of hot write path
Posted by Darrick J. Wong 2 months, 1 week ago
On Thu, Oct 09, 2025 at 05:08:51PM +0800, alexjlzheng@gmail.com wrote:
> From: Jinliang Zheng <alexjlzheng@tencent.com>
> 
> Prefaulting the write source buffer incurs an extra userspace access
> in the common fast path. Make iomap_write_iter() consistent with
> generic_perform_write(): only touch userspace an extra time when
> copy_folio_from_iter_atomic() has failed to make progress.
> 
> This patch is inspired by commit 665575cff098 ("filemap: move
> prefaulting out of hot write path").

Seems fine to me, but I wonder if dhansen has any thoughts about this
patch ... which exactly mirrors one he sent eight months ago?

--D

> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> ---
>  fs/iomap/buffered-io.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8b847a1e27f1..6e6573fce78a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -972,21 +972,6 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
>  		if (bytes > iomap_length(iter))
>  			bytes = iomap_length(iter);
>  
> -		/*
> -		 * Bring in the user page that we'll copy from _first_.
> -		 * Otherwise there's a nasty deadlock on copying from the
> -		 * same page as we're writing to, without it being marked
> -		 * up-to-date.
> -		 *
> -		 * For async buffered writes the assumption is that the user
> -		 * page has already been faulted in. This can be optimized by
> -		 * faulting the user page.
> -		 */
> -		if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
> -			status = -EFAULT;
> -			break;
> -		}
> -
>  		status = iomap_write_begin(iter, write_ops, &folio, &offset,
>  				&bytes);
>  		if (unlikely(status)) {
> @@ -1001,6 +986,12 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
>  		if (mapping_writably_mapped(mapping))
>  			flush_dcache_folio(folio);
>  
> +		/*
> +		 * Faults here on mmap()s can recurse into arbitrary
> +		 * filesystem code. Lots of locks are held that can
> +		 * deadlock. Use an atomic copy to avoid deadlocking
> +		 * in page fault handling.
> +		 */
>  		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
>  		written = iomap_write_end(iter, bytes, copied, folio) ?
>  			  copied : 0;
> @@ -1039,6 +1030,16 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
>  				bytes = copied;
>  				goto retry;
>  			}
> +
> +			/*
> +			 * 'folio' is now unlocked and faults on it can be
> +			 * handled. Ensure forward progress by trying to
> +			 * fault it in now.
> +			 */
> +			if (fault_in_iov_iter_readable(i, bytes) == bytes) {
> +				status = -EFAULT;
> +				break;
> +			}
>  		} else {
>  			total_written += written;
>  			iomap_iter_advance(iter, &written);
> -- 
> 2.49.0
> 
>
Re: [PATCH] iomap: move prefaulting out of hot write path
Posted by Dave Hansen 2 months, 1 week ago
On 10/9/25 08:01, Darrick J. Wong wrote:
> On Thu, Oct 09, 2025 at 05:08:51PM +0800, alexjlzheng@gmail.com wrote:
>> From: Jinliang Zheng <alexjlzheng@tencent.com>
>>
>> Prefaulting the write source buffer incurs an extra userspace access
>> in the common fast path. Make iomap_write_iter() consistent with
>> generic_perform_write(): only touch userspace an extra time when
>> copy_folio_from_iter_atomic() has failed to make progress.
>>
>> This patch is inspired by commit 665575cff098 ("filemap: move
>> prefaulting out of hot write path").
> Seems fine to me, but I wonder if dhansen has any thoughts about this
> patch ... which exactly mirrors one he sent eight months ago?

I don't _really_ care all that much. But, yeah, I would have expected
a little shout-out or something when someone copies the changelog and
code verbatim from another patch:

	https://lore.kernel.org/lkml/20250129181753.3927F212@davehans-spike.ostc.intel.com/

and then copies a comment from a second patch I did.

But I guess I was cc'd at least. Also, if my name isn't on this one,
then I don't have to fix any of the bugs it causes. Right? ;)

Just one warning: be on the lookout for bugs in the area. The
prefaulting definitely does a good job of hiding bugs in other bits
of the code. The generic_perform_write() gunk seems to have uncovered
a bug or two.

Also, didn't Christoph ask you to make the comments wider the last
time Alex posted this? I don't think that got changed.

	https://lore.kernel.org/lkml/aIt8BYa6Ti6SRh8C@infradead.org/

Overall, the change still seems as valid to me as it did when I wrote the
patch in the first place. Although it feels funny to ack my own
patch.
Re: [PATCH] iomap: move prefaulting out of hot write path
Posted by Jinliang Zheng 2 months, 1 week ago
> On 11/9/25 08:01, Darrick J. Wong wrote:
> > On Thu, Oct 09, 2025 at 05:08:51PM +0800, alexjlzheng@gmail.com wrote:
> >> From: Jinliang Zheng <alexjlzheng@tencent.com>
> >>
> >> Prefaulting the write source buffer incurs an extra userspace access
> >> in the common fast path. Make iomap_write_iter() consistent with
> >> generic_perform_write(): only touch userspace an extra time when
> >> copy_folio_from_iter_atomic() has failed to make progress.
> >>
> >> This patch is inspired by commit 665575cff098 ("filemap: move
> >> prefaulting out of hot write path").
> > Seems fine to me, but I wonder if dhansen has any thoughts about this
> > patch ... which exactly mirrors one he sent eight months ago?
> 
> I don't _really_ care all that much. But, yeah, I would have expected
> a little shout-out or something when someone copies the changelog and
> code verbatim from another patch:
> 
> 	https://lore.kernel.org/lkml/20250129181753.3927F212@davehans-spike.ostc.intel.com/
> 
> and then copies a comment from a second patch I did.

Sorry for forgetting to CC you in my previous email.

When I sent V1[1], I hadn't come across this email (which was an oversight on my part):
- https://lore.kernel.org/lkml/20250129181753.3927F212@davehans-spike.ostc.intel.com/

At that time, I was quite puzzled about why generic_perform_write() had moved prefaulting
out of the hot write path, while iomap_write_iter() had not done the same.

It wasn't until I was preparing V2[2] that I found the email above. However, the code around
had already undergone some changes by then, so I rebased the code in this email onto the
upstream version. My apologies for forgetting to CC you earlier.

[1] https://lore.kernel.org/linux-xfs/20250726090955.647131-2-alexjlzheng@tencent.com/
[2] https://lore.kernel.org/linux-xfs/20250730164408.4187624-2-alexjlzheng@tencent.com/

Hope you know I didn't mean any offense. Sorry about that.

> 
> But I guess I was cc'd at least. Also, if my name isn't on this one,
> then I don't have to fix any of the bugs it causes. Right? ;)
> 
> Just one warning: be on the lookout for bugs in the area. The
> prefaulting definitely does a good job of hiding bugs in other bits
> of the code. The generic_perform_write() gunk seems to have uncovered
> a bug or two.

Indeed, the reason I sent this patch was precisely because I was unsure why the change
for iomap_write_iter() hadn't been merged like the one for generic_perform_write() — I
wondered if there might be some underlying issue. I hoped to seek everyone's thoughts
through this patch. :)

> 
> Also, didn't Christoph ask you to make the comments wider the last
> time Alex posted this? I don't think that got changed.
> 
> 	https://lore.kernel.org/lkml/aIt8BYa6Ti6SRh8C@infradead.org/
> 
> Overall, the change still seems as valid to me as it did when I wrote the
> patch in the first place. Although it feels funny to ack my own
> patch.

If moving prefaulting out of the hot write path in iomap_write_iter() is indeed
acceptable, would you mind taking the time to rebase the code from your patch onto
the latest upstream version and submit a new patch? After all, you are the
original author of the change. :)

Thank you very much,
Jinliang. :)